-
Notifications
You must be signed in to change notification settings - Fork 9
feat(sandbox): port buildOrgSnapshotWorkflow + Vercel Workflow runtime #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4cfe2cf
60265ee
27cce76
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| import { buildSnapshotStep, type BuildOrgSnapshotInput } from "@/app/workflows/buildSnapshotStep"; | ||
|
|
||
| /** | ||
| * Vercel Workflow that provisions a per-org base snapshot for warm-boot | ||
| * of future session sandboxes. Kicked fire-and-forget from | ||
| * `createSandboxHandler` when a recoupable org URL is requested but | ||
| * no `created` snapshot exists yet. | ||
| * | ||
| * Single step today (provision + clone + snapshot via `refreshBaseSnapshot`), | ||
| * wrapped here for the durable execution semantics — failures retry up | ||
| * to 3× automatically, the run is observable in the Vercel dashboard, | ||
| * and the request that kicked the workflow is fully decoupled from | ||
| * its lifetime. | ||
| */ | ||
| export async function buildOrgSnapshotWorkflow(input: BuildOrgSnapshotInput) { | ||
| "use workflow"; | ||
|
|
||
| console.log(`[build-org-snapshot] workflow:start name='${input.sandboxName}'`); | ||
|
|
||
| try { | ||
| const snapshotId = await buildSnapshotStep(input); | ||
| console.log(`[build-org-snapshot] Built ${snapshotId} for '${input.sandboxName}'`); | ||
| return { success: true as const, snapshotId }; | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| console.error(`[build-org-snapshot] Failed for '${input.sandboxName}':`, message); | ||
| return { success: false as const, error: message }; | ||
| } | ||
|
Comment on lines
+20
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: No. In Vercel Workflow, if you catch an exception and return a value instead of throwing a failure (i.e., you don’t let the step/workflow end in an error), the run/step is considered successful rather than failed, so it is not eligible for the automatic retry mechanism. What counts as retryable is tied to failures being surfaced as errors: by default, “errors thrown inside steps are retried” [1], and you can explicitly control retrying by throwing RetryableError (or stop retrying by throwing FatalError) [1][2]. If you catch the exception and simply return a value, you’re preventing the thrown error from occurring, so there’s no failure state to trigger retries. Notes: - If a step throws until it exhausts retries, the run is marked failed (and you can inspect the run status as “failed”) [3]. - Separately, retries may also occur due to certain infrastructure/runtime issues, but those still originate from an error path rather than a caught-and-returned success value [4]. Citations:
Rethrow errors to preserve run failure state and enable automatic retries. Catching errors and returning 💡 Proposed fix catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error(`[build-org-snapshot] Failed for '${input.sandboxName}':`, message);
- return { success: false as const, error: message };
+ throw error instanceof Error ? error : new Error(message);
}🤖 Prompt for AI Agents |
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| import { refreshBaseSnapshot } from "@/lib/sandbox/abstraction"; | ||
| import { getServiceGithubToken } from "@/lib/github/getServiceGithubToken"; | ||
| import { DEFAULT_SANDBOX_BASE_SNAPSHOT_ID } from "@/lib/sandbox/defaultBaseSnapshotId"; | ||
| import { shellEscape } from "@/lib/sandbox/shellEscape"; | ||
|
|
||
| const BUILD_SANDBOX_TIMEOUT_MS = 10 * 60 * 1000; // 10 minutes | ||
| const BUILD_COMMAND_TIMEOUT_MS = 8 * 60 * 1000; // 8 minutes — leaves buffer under sandbox timeout | ||
|
|
||
| export interface BuildOrgSnapshotInput { | ||
| cloneUrl: string; | ||
| sandboxName: string; | ||
| } | ||
|
|
||
| /** | ||
| * Single step of `buildOrgSnapshotWorkflow`. Provisions a sandbox from | ||
| * the recoup base snapshot, runs `git clone --depth=1 <cloneUrl> .` | ||
| * inside it, and snapshots the result. Returns the new snapshot id. | ||
| * | ||
| * The cloneUrl is shell-escaped before interpolation: the validator | ||
| * upstream of this workflow already rejects anything that doesn't | ||
| * match `^https:\/\/github\.com\/recoupable\/...`, but defense-in-depth | ||
| * — never trust the validator to also be a shell-quoter. | ||
| * | ||
| * Logging deliberately omits `cloneUrl` to avoid surfacing any token | ||
| * embedded as `https://user:token@github.com/...`. The `sandboxName` | ||
| * is the regex-extracted repo name only, so it's safe to log. | ||
| */ | ||
| export async function buildSnapshotStep(input: BuildOrgSnapshotInput): Promise<string> { | ||
| "use step"; | ||
|
|
||
| console.log(`[build-org-snapshot] step:start name='${input.sandboxName}'`); | ||
|
|
||
| const githubToken = getServiceGithubToken() ?? undefined; | ||
| if (!githubToken) { | ||
| throw new Error("[build-org-snapshot] GITHUB_TOKEN is not set; cannot clone org repo"); | ||
| } | ||
|
|
||
| const result = await refreshBaseSnapshot({ | ||
| baseSnapshotId: DEFAULT_SANDBOX_BASE_SNAPSHOT_ID, | ||
| sandboxName: input.sandboxName, | ||
| sandboxTimeoutMs: BUILD_SANDBOX_TIMEOUT_MS, | ||
| commandTimeoutMs: BUILD_COMMAND_TIMEOUT_MS, | ||
| githubToken, | ||
| commands: [`git clone --depth=1 ${shellEscape(input.cloneUrl)} .`], | ||
| log: message => console.log(`[build-org-snapshot] ${message}`), | ||
| }); | ||
|
|
||
| return result.snapshotId; | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ import { connectSandbox } from "@/lib/sandbox/factory"; | |
| import { findOrgSnapshot } from "@/lib/sandbox/findOrgSnapshot"; | ||
| import { getSessionSandboxName } from "@/lib/sandbox/getSessionSandboxName"; | ||
| import { installSessionGlobalSkills } from "@/lib/sandbox/installSessionGlobalSkills"; | ||
| import { kickBuildOrgSnapshotWorkflow } from "@/lib/sandbox/kickBuildOrgSnapshotWorkflow"; | ||
| import { extractOrgRepoName } from "@/lib/recoupable/extractOrgRepoName"; | ||
| import { updateSession } from "@/lib/supabase/sessions/updateSession"; | ||
| import { getServiceGithubToken } from "@/lib/github/getServiceGithubToken"; | ||
|
|
@@ -67,6 +68,17 @@ export async function createSandboxHandler(request: NextRequest): Promise<NextRe | |
| const orgRepoName = extractOrgRepoName(body.repoUrl); | ||
| const orgSnapshotId = orgRepoName ? await findOrgSnapshot(orgRepoName) : null; | ||
|
|
||
| // Miss: kick a background workflow to build a snapshot for this org so | ||
| // the *next* session warm-boots from it. This request still pays the | ||
| // full-clone cold-start path — the workflow runs durably outside the | ||
| // request lifecycle. | ||
| if (orgRepoName && !orgSnapshotId) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P2: The new snapshot-miss branch can repeatedly start duplicate build workflows for the same org while an earlier build is still in progress. Prompt for AI agents |
||
| kickBuildOrgSnapshotWorkflow({ | ||
| cloneUrl: body.repoUrl, | ||
| sandboxName: orgRepoName, | ||
| }); | ||
| } | ||
|
Comment on lines
+75
to
+80
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: In workflow@4.2.0-beta.72, the recommended way to deduplicate Citations:
Use deterministic When parallel requests miss the snapshot check simultaneously, both can trigger identical builds. Implement idempotency by passing a deterministic This prevents redundant clones, snapshot operations, and avoidable cost/rate pressure from concurrent requests. 🤖 Prompt for AI Agents |
||
|
|
||
| const startTime = Date.now(); | ||
|
|
||
| let sandbox; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| /** | ||
| * Base snapshot used by `buildOrgSnapshotWorkflow` to bootstrap a fresh | ||
| * sandbox before cloning an org repo into it. Lets the workflow skip | ||
| * provisioning a bare image and start from one with the standard | ||
| * Recoup tooling already installed, so the subsequent `git clone` is | ||
| * the only meaningful work. | ||
| * | ||
| * Override at deploy time via `VERCEL_SANDBOX_BASE_SNAPSHOT_ID` to | ||
| * roll forward to a newer base. The hardcoded fallback is the | ||
| * snapshot that lives in the Recoup Vercel team. | ||
| * | ||
| * Current snapshot includes: | ||
| * - jq (dnf install -y jq) | ||
| * - bun (curl -fsSL https://bun.sh/install | sudo BUN_INSTALL=/usr/local bash) | ||
| * - agent-browser (sudo npm install -g agent-browser) | ||
| * - code-server (curl -fsSL https://code-server.dev/install.sh | sudo sh) | ||
| * | ||
| * To refresh: provision a clean sandbox with the @vercel/sandbox SDK, | ||
| * run the install commands above (plus any new ones), snapshot it via | ||
| * `vercel sandbox snapshot <id> --stop`, and update the constant | ||
| * below with the new id. | ||
| * | ||
| * Tooling note: chromium is intentionally NOT in this base — Amazon | ||
| * Linux 2023's default repo doesn't carry it, and `agent-browser` | ||
| * fetches a managed Playwright browser on first use anyway. | ||
| */ | ||
| export const DEFAULT_SANDBOX_BASE_SNAPSHOT_ID = | ||
| process.env.VERCEL_SANDBOX_BASE_SNAPSHOT_ID ?? "snap_RgVtpDO4y1BJHQiUbptMwS3Rt2EQ"; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| import { start } from "workflow/api"; | ||
| import { buildOrgSnapshotWorkflow } from "@/app/workflows/buildOrgSnapshotWorkflow"; | ||
|
|
||
| interface KickBuildOrgSnapshotInput { | ||
| cloneUrl: string; | ||
| sandboxName: string; | ||
| } | ||
|
|
||
| /** | ||
| * Fire-and-forget kick of `buildOrgSnapshotWorkflow`. Used by | ||
| * `createSandboxHandler` when a recoupable org repo is requested but | ||
| * no `created` snapshot exists yet — the next session for the same | ||
| * org will warm-boot from the snapshot this build produces. | ||
| * | ||
| * Failures are logged but never surfaced. The current request always | ||
| * falls back to the slow full-clone path; what we're protecting is | ||
| * that *future* requests don't have to. | ||
| * | ||
| * Logging omits `cloneUrl` to avoid surfacing any embedded credential | ||
| * (e.g. `https://user:token@github.com/...`) — the `sandboxName` is | ||
| * already the regex-extracted repo name only, which uniquely | ||
| * identifies the org for observability without exposing tokens. | ||
| * | ||
| * @param input - The repo URL to clone and the sandbox name to use | ||
| * (which becomes the snapshot's name and the lookup key for | ||
| * `findOrgSnapshot`). | ||
| */ | ||
| export function kickBuildOrgSnapshotWorkflow(input: KickBuildOrgSnapshotInput) { | ||
| void start(buildOrgSnapshotWorkflow, [input]).then( | ||
| run => | ||
| console.log( | ||
| `[build-org-snapshot] Started workflow run ${run.runId} for '${input.sandboxName}'`, | ||
| ), | ||
| error => | ||
| console.error( | ||
| `[build-org-snapshot] Failed to start workflow for '${input.sandboxName}':`, | ||
| error, | ||
| ), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Do not swallow workflow errors; rethrow after logging so failed runs are recorded correctly.
Prompt for AI agents
Tip: Review your code locally with the cubic CLI to iterate faster.