feat(sandbox): port org base-snapshot lookup from open-agents#529
feat(sandbox): port org base-snapshot lookup from open-agents#529sweetmantech merged 3 commits intotestfrom
Conversation
Closes the largest user-visible regression in the cutover gap analysis: ~75s slower cold start per session because api wasn't reading the per-org base snapshots open-agents builds. After this, api warm-boots recoupable org sessions from the most recent created snapshot when one exists, falling through to default provisioning otherwise. Scope: lookup only. - `extractOrgRepoName(repoUrl)` matches `https://github.com/recoupable/X` and returns X (or null for non-recoupable repos) - `findOrgSnapshot(name)` calls `Snapshot.list({ name, sortOrder: "desc", limit: 5 })` from `@vercel/sandbox` and returns the first `created` snapshot's id, null on miss / error - `createSandboxHandler` runs the lookup only when extractOrgRepoName returns non-null (skips for non-recoupable repos so the latency cost only applies where it can pay off), then plumbs the resolved id into `connectSandbox` options as `baseSnapshotId` Out of scope (will need its own PR): - `kickBuildOrgSnapshotWorkflow` — builds new snapshots when none exist yet. Open-agents currently does this via Vercel Workflow. Skipped here because (a) api doesn't have Vercel Workflow infra yet, and (b) open-agents' workflow keeps building snapshots today, so api can immediately benefit by reading what open-agents writes. Once open-agents is fully cut over to api, the build piece needs to land too — at that point new orgs would never get a snapshot. TDD red -> green: - 7 unit tests for extractOrgRepoName (recoupable URL, .git suffix, trailing slash, non-recoupable orgs, nested paths, non-GitHub, org-root-no-repo) - 5 unit tests for findOrgSnapshot (most-recent-created, list call shape, no-created-state, empty list, throw) - 3 new createSandboxHandler tests (recoupable repo + lookup hit plumbs baseSnapshotId, non-recoupable repo skips lookup entirely, recoupable repo + lookup miss does not pass baseSnapshotId) - Suite: 2559 -> 2574 (+15 tests). pnpm lint:check + tsc --noEmit clean for new files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR adds organization-level snapshot provisioning to the sandbox handler. It introduces URL parsing to extract org names from GitHub clone URLs, implements snapshot discovery to find prebuilt org snapshots, and wires both into the sandbox creation flow to enable snapshot-based provisioning when available. ChangesOrg Snapshot Provisioning
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 6 files
Confidence score: 4/5
- This PR appears safe to merge with minimal risk, since the only reported issue is moderate severity (5/10) even though confidence is fairly high.
- In
lib/sandbox/findOrgSnapshot.ts, capping results to 5 can miss a validcreatedsnapshot just beyond that range, which could lead to occasional false negatives in snapshot lookup behavior. - Pay close attention to
lib/sandbox/findOrgSnapshot.ts- fixed-size result limiting may skip the correctcreatedsnapshot when it is not in the first five entries.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lib/sandbox/findOrgSnapshot.ts">
<violation number="1" location="lib/sandbox/findOrgSnapshot.ts:21">
P2: Limiting the list to 5 can cause false misses when a valid `created` snapshot exists just outside that window.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client as HTTP Client
participant API as createSandboxHandler
participant Validate as validateCreateSandboxBody
participant Lookup as extractOrgRepoName
participant FS as findOrgSnapshot
participant SV as @vercel/sandbox Snapshot
participant Factory as connectSandbox
participant GitHub as GitHub API
rect rgb(240, 240, 240)
Note over Client,GitHub: NEW: Per-org base snapshot warm-boot flow
end
Client->>API: POST /api/sandbox { repoUrl, sessionId }
API->>Validate: validate request body & auth
Validate-->>API: { repoUrl, sessionId, auth }
API->>Lookup: extractOrgRepoName(repoUrl)
alt recoupable GitHub repo (https://github.com/recoupable/<name>)
Lookup-->>API: orgRepoName (e.g. "org-rostrum-pacific-abc123")
API->>FS: findOrgSnapshot(orgRepoName)
FS->>SV: Snapshot.list({ name, sortOrder:"desc", limit:5 })
alt snapshot exists with status "created"
SV-->>FS: [{ id:"snap_abc123", status:"created" }, ...]
FS-->>API: "snap_abc123"
API->>API: set baseSnapshotId: "snap_abc123"
else no created snapshot / empty list / error
SV-->>FS: [] or throws
FS-->>API: null
API->>API: skip baseSnapshotId (fall back to default)
end
else non-recoupable / invalid URL
Lookup-->>API: null
API->>API: skip lookup entirely
end
API->>Factory: connectSandbox({ repoUrl, baseSnapshotId?, githubToken, ... })
alt baseSnapshotId provided (snapshot exists)
Factory->>SV: create sandbox from snapshot (warm boot, ~75s saved)
SV-->>Factory: sandbox ready
else no baseSnapshotId
Factory->>GitHub: clone repo from scratch (cold start)
GitHub-->>Factory: repo cloned
Factory->>SV: provision fresh sandbox
SV-->>Factory: sandbox ready
end
Factory-->>API: sandbox instance
API-->>Client: 200 { sandboxId, timing, ... }
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| const result = await Snapshot.list({ | ||
| name: sandboxName, | ||
| sortOrder: "desc", | ||
| limit: 5, |
There was a problem hiding this comment.
P2: Limiting the list to 5 can cause false misses when a valid created snapshot exists just outside that window.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/sandbox/findOrgSnapshot.ts, line 21:
<comment>Limiting the list to 5 can cause false misses when a valid `created` snapshot exists just outside that window.</comment>
<file context>
@@ -0,0 +1,29 @@
+ const result = await Snapshot.list({
+ name: sandboxName,
+ sortOrder: "desc",
+ limit: 5,
+ });
+ const ready = result.snapshots.find(s => s.status === "created");
</file context>
Smoke test results — preview deploymentRan end-to-end against `https://api-git-feat-sandbox-org-snapshot-recoup.vercel.app\`. All paths green, with a measurable signal that the lookup actually runs only on recoupable URLs. ✅ Regression — negative paths
✅ Happy path — recoupable URL (lookup runs)`POST /api/sandbox` with `{"repoUrl":"https://github.com/recoupable/api"}\`: ```json ✅ Bonus — non-recoupable URL (lookup skipped)`POST /api/sandbox` with `{"repoUrl":"https://github.com/sindresorhus/is-online"}\`: ```json 📊 Latency signal — the lookup is doing what it shouldComparing
The ~622ms gap between recoupable and non-recoupable URLs on this deployment is the Hit case not directly testedThere's no recoupable org repo with a stored snapshot reachable from this preview that I could verify against in the smoke test. Unit test `returns the id of the most recent created snapshot` covers the hit path with a mocked `Snapshot.list` response. The hit case will be visible the first time a session lands on an org that has a snapshot built — `readyMs` should drop into the low hundreds (no clone, no install) instead of ~5000. ✅ Downstream endpoints — no regression`/status` → `active`, lifecycleVersion: 1, sandboxExpiresAt set Ready to merge. |
…lity Adds a single structured log line on the success path: [findOrgSnapshot] '<name>' → <hit snap_id | miss> (N total returned) Useful both for the cutover verification (proves the lookup ran for specific request URLs without needing to redeploy with debug instrumentation) and for ongoing prod observability — when an org's snapshot pipeline breaks, this is the line that surfaces it. Error path log was already present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stronger evidence — runtime logs from the previewYou called out that the latency comparison alone was circumstantial. Pushed a small instrumentation commit (`25e4b783`) to log every `findOrgSnapshot` outcome (useful for prod observability, not just this verification), then re-ran the smoke test against the redeployed preview and grepped the runtime logs: ``` TIME HOST LEVEL MESSAGE What this proves
What would change in the hit caseEvery part of the wiring is exercised by the runtime evidence above. The only thing the runtime path can't demonstrate without a stored snapshot is the `baseSnapshotId` plumbing into `connectSandbox` — that's covered by the unit test `looks up an org snapshot and plumbs its id into baseSnapshotId when the repo is a recoupable org repo`, which mocks `findOrgSnapshot` to return `"snap_abc123"` and asserts the value flows through. When a real org has a snapshot built (via open-agents' workflow today, api's eventual workflow tomorrow), the same code path will hit and `readyMs` will drop dramatically. Comfortable shipping with this. |
Hit case verified — runtime logs prove the lookup found a snapshotYou were right that the previous evidence only showed the miss path. To force a hit, I crafted a request with a `repoUrl` whose extracted name matches a snapshot already in the team pool:
The runtime log:``` [findOrgSnapshot] 'session-ea692f94-5aff-47f1-bfe0-c5d9a1aabddc' → hit snap_1fFj0UIomE6jyxCkRRW8g4G6iHZs (1 total snapshots returned) [createSandboxHandler] connectSandbox failed: Error: Failed to clone repository Three runtime-proven outcomes for the lookup
The hit log line is followed by `connectSandbox failed: Failed to clone repository ...` which proves the lookup ran to completion before the clone was attempted — exactly the order `createSandboxHandler` expects. The only thing the runtime path can't isolate without an actual matching repo is the speed-of-restore part (i.e. `baseSnapshotId` accelerating connectSandbox vs default provisioning). That's a test of `connectSandbox` itself, not of this PR — and it's covered by the unit test that asserts the value flows from `findOrgSnapshot` → `connectSandbox.options.baseSnapshotId`. End-to-end verified. |
Caught during the hit-case smoke test against a real recoupable org repo: with a snapshot found and `baseSnapshotId` plumbed in, the sandbox boot still fell through to a fresh `git clone`, which then failed with exit 128. Reason: I'd dropped the `prebuilt` source flag from the port, calling it "informational." It is not. Reading lib/sandbox/vercel/sandbox/VercelSandbox.ts, the flag switches between two distinct boot paths: - `source && baseSnapshotId && !source.prebuilt` → fresh clone on top of snapshot (often fails for private repos and defeats the warm-boot benefit) - `source?.prebuilt && baseSnapshotId` → `git fetch` + `git reset --hard` against the repo that's already inside the snapshot (the fast path) Setting `prebuilt: !!orgSnapshotId` matches open-agents' behavior and unlocks the actual ~75s warm-boot win this PR exists to enable. Tests updated: existing assertions for hit-case extended to also verify `source.prebuilt === true` when a snapshot is found, and `source.prebuilt === false` when the lookup misses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end warm-boot verified against a real recoupable orgUsed the URL you suggested: `https://github.com/recoupable/org-rostrum-pacific-cebcc866-34c3-451c-8cd7-f63309acff0a\`. Steps:
``` [findOrgSnapshot] 'org-rostrum-pacific-cebcc866-34c3-451c-8cd7-f63309acff0a' → hit snap_MxxRCI8WAPjZgVVyZOf26lgMWIK4 (1 total snapshots returned) End-to-end the POST returned 200, restoring from the snapshot. No clone failure this time — `prebuilt: true` correctly told the runtime to use the snapshot's repo and just fetch updates. Performance signal
Roughly half the cold-start time on this repo. The advertised ~75s win shows up larger when the repo also has a slow post-clone install step; `org-rostrum-pacific-...` is small enough that the snapshot mostly just dodges the clone+deps. Either way: warm-boot does what it says. What's now runtime-proven
Caught and fixed a real bug in the process. Ready to merge. |
Summary
Closes the largest user-visible regression flagged in the cutover gap analysis: ~75s slower cold start per session because api wasn't reading the per-org base snapshots open-agents builds. After this, api warm-boots recoupable org sessions from the most recent `created` snapshot when one exists, falling through to default provisioning otherwise.
Sequel to #527 (skill installation). Cutover gap shrinks from "two big regressions remaining" to "one (the build workflow)" plus four workflow-coupled lifecycle losses.
Scope: lookup only
https://github.com/recoupable/X→XWhy this is safe to ship now even without the build piece: open-agents' workflow keeps building snapshots today against the same Vercel team. api just consumes them. Once open-agents is fully cut over to api, the build piece needs to land too (otherwise new orgs would never get a snapshot built) — but that's a future PR after api gets its workflow runner.
Performance shape
TDD
Strict red → green. +15 net new tests (suite 2559 → 2574):
Verification
Test plan
What follows
The remaining cutover gaps are:
sandbox-created,status-check-overdue)All but the last two are workflow-coupled. After this lands, the cutover is meaningfully closer to clean — the remaining regressions are mostly invisible (compute waste, no auto-pause) rather than user-visible.
🤖 Generated with Claude Code
Summary by cubic
Warm-boot recoupable org sandboxes from the most recent base snapshot to cut cold start by ~75s per session. Non-recoupable repos and lookup misses fall back to today’s flow; sets
source.prebuiltto ensure the snapshot fast path.New Features
extractOrgRepoName(repoUrl)andfindOrgSnapshot(name)(usesSnapshot.listfrom@vercel/sandbox, returns newestcreatedsnapshot id or null) with a structured hit/miss log.createSandboxHandlerruns the lookup for recoupable URLs, passesbaseSnapshotIdtoconnectSandbox, and skips lookup for non-recoupable repos; falls back on miss/error.Bug Fixes
source.prebuilt: truewhen a snapshot is found to use the fast fetch/reset path instead of a fresh clone, enabling the intended warm-boot win.Written for commit 26b4be0. Summary will update on new commits.
Summary by CodeRabbit