Add `ecs reachability` subcommand to diagnose private-VPC egress failures

## Motivation

When a customer deploys a Quilt CloudFormation stack into a locked-down VPC (no Internet Gateway, no NAT, incomplete VPC endpoints), the stack hangs on ECS service creation because tasks can't pull images from ECR. The failure mode is silent — CloudFormation events don't say "ECR unreachable," and CloudWatch log groups may never even be created.

This came up on a Biogen support call (2026-05-15): we spent most of a 23-minute meeting diagnosing exactly this, with engineers guessing at which VPC endpoints were missing. Alexei committed to sending a reachability-test script, but it doesn't exist as a reusable tool. We should ship it as a `quiltx ecs` subcommand so support can hand customers a one-liner.

Belongs under `ecs` because the test must run from inside an ECS task in the customer's VPC — that's the exact network context where the failure occurs, and `quiltx/ecs.py` already has the task-launching plumbing (`run_task`, `wait_for_task`, `get_network_config`).

Related: [`docs/advanced-features/private-endpoint-access.md`](https://github.com/quiltdata/quilt/blob/master/docs/advanced-features/private-endpoint-access.md) in the quilt repo needs to link to this once it exists.

## Proposed UX

```
quiltx ecs reachability --stack <stack-name>     # auto-discover VPC/subnets from stack
quiltx ecs reachability --vpc vpc-xxx --subnet subnet-yyy
quiltx ecs reachability script                   # emit portable bash script (no deploy needed)
```

Output: a table of `service → endpoint → reachable? → resolved IP (public/private)` so the customer can see exactly which AWS services their VPC cannot reach.

## What it should check

From the call and a read of the CFN template, at minimum:

- ECR API (`api.ecr.<region>.amazonaws.com`)
- ECR DKR (`*.dkr.ecr.<region>.amazonaws.com`) — image pulls
- S3 (gateway endpoint)
- CloudWatch Logs
- SNS
- SQS
- Secrets Manager
- STS
- API Gateway (interface endpoint, since `ApiGatewayVPCEndpointId` is a stack param)

The exact list should be derived from the CFN template, not hardcoded — open question below.

## Implementation sketch

- Reuse the existing `run_task` / `get_network_config` helpers in `quiltx/ecs.py` to launch a short-lived task in the target VPC/subnet that runs DNS lookups + TCP connect tests against each service endpoint, then returns JSON via logs.
- Chicken-and-egg: if ECR is unreachable, the probe task itself can't pull its image. Options:
  - Use a public ECR image already cached in AWS (e.g., `public.ecr.aws/amazonlinux/amazonlinux`) — still needs egress, so doesn't help in the worst case
  - Fall back to `script` mode: emit a portable bash script the customer runs from any existing EC2 in the VPC, no Quilt deploy required
- Recommend supporting **both** modes; `script` is the escape hatch when `run` itself can't start.

## Open questions

- Authoritative list of AWS services each Quilt component calls out to — needs confirmation from the platform team, not just grep.
- Best base image for the probe task that maximizes chance of starting in a partially-broken VPC.
- Do we want to also check outbound HTTPS to non-AWS services?

## Acceptance criteria

- [ ] `quiltx ecs reachability --stack <name>` runs end-to-end against a real private-VPC deployment
- [ ] `quiltx ecs reachability script` emits a self-contained bash script that runs on any Linux EC2 in the VPC
- [ ] Output clearly distinguishes "DNS resolves to private IP via endpoint" vs "DNS resolves to public IP (needs IGW/NAT)" vs "unreachable"
- [ ] Documented in `docs/advanced-features/private-endpoint-access.md` (separate PR in quilt repo)
- [ ] Service list sourced from CFN template, with a test that fails if the template adds a service the checker doesn't know about

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ecs reachability` subcommand to diagnose private-VPC egress failures #52

Motivation

Proposed UX

What it should check

Implementation sketch

Open questions

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add ecs reachability subcommand to diagnose private-VPC egress failures #52

Description

Motivation

Proposed UX

What it should check

Implementation sketch

Open questions

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Add `ecs reachability` subcommand to diagnose private-VPC egress failures #52