feat: instrument sandbox startup timing#8
Conversation
Adds structured per-step timing logs so we can identify which phase dominates the 60+s new-sandbox startup before committing to a fix. Two log streams: - [sandbox-handler-timing] (apps/web): auth_and_db_lookups, connect_sandbox, update_session_state, install_global_skills, handler_total - [sandbox-timing] (packages/sandbox): vm_provision (with mode), git_clone, git_remote_set_url, git_config_user, git_checkout_new_branch, after_start_hook, create_total Each log line includes step_ms (this phase) and elapsed_ms (since start) so the dominant cost surfaces immediately in Vercel function logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughIntroduced request-scoped timing instrumentation in the sandbox creation handler and SDK provisioning process. Added step-by-step timing logs at key control-flow points to measure elapsed time between phases and total duration, without modifying core functionality or public API signatures. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/web/lib/sandbox/create-sandbox-handler.ts (2)
120-145:⚠️ Potential issue | 🟡 MinorEmit
connect_sandboxtiming when provisioning fails.If
connectSandboxspends 60+ seconds in provision/clone and then throws, this returns before logging the phase, hiding the latency this PR is meant to isolate.📈 Proposed fix
try { sandbox = await connectSandbox({ state: { type: "vercel", ...(sandboxName ? { sandboxName } : {}), source, @@ }, }); } catch (error) { + logTiming("connect_sandbox", sessionId, { success: false }); console.error("connectSandbox failed:", error); return Response.json( { error: "Failed to provision sandbox" }, { status: 502 }, ); } - logTiming("connect_sandbox", sessionId); + logTiming("connect_sandbox", sessionId, { success: true });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/lib/sandbox/create-sandbox-handler.ts` around lines 120 - 145, The connectSandbox call currently logs timing only on success, so if it throws you return early and lose the "connect_sandbox" latency; update the error path to emit the timing before returning by calling logTiming("connect_sandbox", sessionId) in the catch (or move the timing call into a finally block that always runs). Change the catch in create-sandbox-handler.ts (around connectSandbox) to invoke logTiming("connect_sandbox", sessionId) prior to Response.json return (or refactor to try/catch/finally so logTiming always executes).
160-174:⚠️ Potential issue | 🟡 MinorRecord the global-skills phase even on failure.
A slow failing
installSessionGlobalSkillscall is swallowed by the catch block and only appears as an unlabeled gap beforehandler_total.📈 Proposed fix
if (sessionRecord) { + let globalSkillsInstalled = false; try { await installSessionGlobalSkills({ sessionRecord, sandbox, }); - logTiming("install_global_skills", sessionId, { - skillCount: sessionRecord.globalSkillRefs?.length ?? 0, - }); + globalSkillsInstalled = true; } catch (error) { console.error( `Failed to install global skills for session ${sessionRecord.id}:`, error, ); } + logTiming("install_global_skills", sessionId, { + skillCount: sessionRecord.globalSkillRefs?.length ?? 0, + success: globalSkillsInstalled, + }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/lib/sandbox/create-sandbox-handler.ts` around lines 160 - 174, The global-skills timing is only recorded on success, so move or duplicate the logTiming call for the "install_global_skills" phase so it always runs (including on failure) for the given sessionId; inside the try set logTiming("install_global_skills", sessionId, { skillCount: sessionRecord.globalSkillRefs?.length ?? 0, success: true }) and in the catch call logTiming("install_global_skills", sessionId, { skillCount: sessionRecord.globalSkillRefs?.length ?? 0, success: false, error: String(error) }) before rethrowing or swallowing, referencing installSessionGlobalSkills, sessionRecord and sessionId to locate the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/lib/sandbox/create-sandbox-handler.ts`:
- Around line 29-43: The timing log uses lastStep for step_ms causing the
"handler_total" phase to show only the last interval; update the logTiming
function so when phase === "handler_total" it computes step_ms using
handlerStart instead of lastStep (i.e., step_ms = now - handlerStart) while
other phases keep step_ms = now - lastStep; adjust the computation in logTiming
(referencing logTiming, handlerStart, lastStep, and the "handler_total" phase)
and ensure the same change is applied to the duplicate occurrence noted around
lines 183-184.
In `@packages/sandbox/vercel/sandbox.ts`:
- Around line 516-526: The timing helper logTiming currently computes step_ms as
now - lastStep which makes the "create_total" phase report only the last
segment; change logTiming so when phase === "create_total" (or when a boolean
total flag) it sets step_ms to now - createStart instead of now - lastStep,
while keeping elapsed_ms = now - createStart for all phases; update references
to lastStep and createStart in the logTiming closure (and any other create_total
logTiming call sites) so the "create_total" entry reflects the full
VercelSandbox.create() duration.
- Line 615: The log currently passes the raw repository URL (source.url) into
logTiming("git_clone"), which can expose private repo info or embedded
credentials; instead compute and log a non-reversible fingerprint of the URL
(e.g., SHA256/SHA1 digest truncated) and/or the sandbox name, and pass that
fingerprint to logTiming("git_clone") rather than source.url; ensure you
reference source.url only to compute the hash and do not persist or log the raw
URL anywhere in the code paths around logTiming("git_clone").
---
Outside diff comments:
In `@apps/web/lib/sandbox/create-sandbox-handler.ts`:
- Around line 120-145: The connectSandbox call currently logs timing only on
success, so if it throws you return early and lose the "connect_sandbox"
latency; update the error path to emit the timing before returning by calling
logTiming("connect_sandbox", sessionId) in the catch (or move the timing call
into a finally block that always runs). Change the catch in
create-sandbox-handler.ts (around connectSandbox) to invoke
logTiming("connect_sandbox", sessionId) prior to Response.json return (or
refactor to try/catch/finally so logTiming always executes).
- Around line 160-174: The global-skills timing is only recorded on success, so
move or duplicate the logTiming call for the "install_global_skills" phase so it
always runs (including on failure) for the given sessionId; inside the try set
logTiming("install_global_skills", sessionId, { skillCount:
sessionRecord.globalSkillRefs?.length ?? 0, success: true }) and in the catch
call logTiming("install_global_skills", sessionId, { skillCount:
sessionRecord.globalSkillRefs?.length ?? 0, success: false, error: String(error)
}) before rethrowing or swallowing, referencing installSessionGlobalSkills,
sessionRecord and sessionId to locate the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c9cd694f-3449-4118-ac00-66376f89551f
📒 Files selected for processing (2)
apps/web/lib/sandbox/create-sandbox-handler.tspackages/sandbox/vercel/sandbox.ts
| const logTiming = ( | ||
| phase: string, | ||
| sessionId: string | undefined, | ||
| extra?: Record<string, unknown>, | ||
| ) => { | ||
| const now = Date.now(); | ||
| console.log("[sandbox-handler-timing]", { | ||
| sessionId: sessionId ?? "unknown", | ||
| phase, | ||
| step_ms: now - lastStep, | ||
| elapsed_ms: now - handlerStart, | ||
| ...extra, | ||
| }); | ||
| lastStep = now; | ||
| }; |
There was a problem hiding this comment.
Make handler_total.step_ms represent the full request duration.
handler_total currently uses lastStep, so step_ms is only the final gap since the previous phase. Use handlerStart for this phase to keep the “total” line unambiguous.
📏 Proposed fix
const logTiming = (
phase: string,
sessionId: string | undefined,
extra?: Record<string, unknown>,
+ stepStart = lastStep,
) => {
const now = Date.now();
console.log("[sandbox-handler-timing]", {
sessionId: sessionId ?? "unknown",
phase,
- step_ms: now - lastStep,
+ step_ms: now - stepStart,
elapsed_ms: now - handlerStart,
...extra,
});
lastStep = now;
}; const readyMs = Date.now() - handlerStart;
- logTiming("handler_total", sessionId, { readyMs });
+ logTiming("handler_total", sessionId, { readyMs }, handlerStart);Also applies to: 183-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/lib/sandbox/create-sandbox-handler.ts` around lines 29 - 43, The
timing log uses lastStep for step_ms causing the "handler_total" phase to show
only the last interval; update the logTiming function so when phase ===
"handler_total" it computes step_ms using handlerStart instead of lastStep
(i.e., step_ms = now - handlerStart) while other phases keep step_ms = now -
lastStep; adjust the computation in logTiming (referencing logTiming,
handlerStart, lastStep, and the "handler_total" phase) and ensure the same
change is applied to the duplicate occurrence noted around lines 183-184.
| const logTiming = (phase: string, extra?: Record<string, unknown>) => { | ||
| const now = Date.now(); | ||
| console.log("[sandbox-timing]", { | ||
| name: name ?? "unnamed", | ||
| phase, | ||
| step_ms: now - lastStep, | ||
| elapsed_ms: now - createStart, | ||
| ...extra, | ||
| }); | ||
| lastStep = now; | ||
| }; |
There was a problem hiding this comment.
Make create_total.step_ms represent the total duration.
logTiming("create_total") currently reports step_ms since the previous logged phase, not the full VercelSandbox.create() duration. That can under-report the total when consumers group by phase and read step_ms.
📏 Proposed fix
- const logTiming = (phase: string, extra?: Record<string, unknown>) => {
+ const logTiming = (
+ phase: string,
+ extra?: Record<string, unknown>,
+ stepStart = lastStep,
+ ) => {
const now = Date.now();
console.log("[sandbox-timing]", {
name: name ?? "unnamed",
phase,
- step_ms: now - lastStep,
+ step_ms: now - stepStart,
elapsed_ms: now - createStart,
...extra,
});
lastStep = now;
};- logTiming("create_total");
+ logTiming("create_total", undefined, createStart);Also applies to: 724-724
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/sandbox/vercel/sandbox.ts` around lines 516 - 526, The timing helper
logTiming currently computes step_ms as now - lastStep which makes the
"create_total" phase report only the last segment; change logTiming so when
phase === "create_total" (or when a boolean total flag) it sets step_ms to now -
createStart instead of now - lastStep, while keeping elapsed_ms = now -
createStart for all phases; update references to lastStep and createStart in the
logTiming closure (and any other create_total logTiming call sites) so the
"create_total" entry reflects the full VercelSandbox.create() duration.
| `Failed to clone repository '${source.url}' (exit code ${cloneResult.exitCode})`, | ||
| ); | ||
| } | ||
| logTiming("git_clone", { repo: source.url }); |
There was a problem hiding this comment.
Avoid logging raw repository URLs.
source.url can disclose private repo identity and may include embedded credentials if a caller passes an authenticated URL. The sandbox name already provides correlation, so omit the repo or log a non-reversible fingerprint instead.
🛡️ Proposed fix
- logTiming("git_clone", { repo: source.url });
+ logTiming("git_clone");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logTiming("git_clone", { repo: source.url }); | |
| logTiming("git_clone"); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/sandbox/vercel/sandbox.ts` at line 615, The log currently passes the
raw repository URL (source.url) into logTiming("git_clone"), which can expose
private repo info or embedded credentials; instead compute and log a
non-reversible fingerprint of the URL (e.g., SHA256/SHA1 digest truncated)
and/or the sandbox name, and pass that fingerprint to logTiming("git_clone")
rather than source.url; ensure you reference source.url only to compute the hash
and do not persist or log the raw URL anywhere in the code paths around
logTiming("git_clone").
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/sandbox/vercel/sandbox.ts">
<violation number="1" location="packages/sandbox/vercel/sandbox.ts:615">
P2: Avoid logging the raw repo URL here; it can leak embedded credentials if callers provide a credentialed GitHub URL.</violation>
</file>
You're on the cubic free plan with 20 free PR reviews remaining this month. Upgrade for unlimited reviews.
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| `Failed to clone repository '${source.url}' (exit code ${cloneResult.exitCode})`, | ||
| ); | ||
| } | ||
| logTiming("git_clone", { repo: source.url }); |
There was a problem hiding this comment.
P2: Avoid logging the raw repo URL here; it can leak embedded credentials if callers provide a credentialed GitHub URL.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/sandbox/vercel/sandbox.ts, line 615:
<comment>Avoid logging the raw repo URL here; it can leak embedded credentials if callers provide a credentialed GitHub URL.</comment>
<file context>
@@ -592,6 +612,7 @@ ${hostLine}${portLines}${runtimeEnvLine}`;
`Failed to clone repository '${source.url}' (exit code ${cloneResult.exitCode})`,
);
}
+ logTiming("git_clone", { repo: source.url });
}
</file context>
| logTiming("git_clone", { repo: source.url }); | |
| logTiming("git_clone"); |
Summary
step_ms(this phase) andelapsed_ms(since start), so the dominant cost surfaces immediately when grepping Vercel function logs.Log streams
[sandbox-handler-timing]—apps/web/lib/sandbox/create-sandbox-handler.tsauth_and_db_lookups— Privy session +requireOwnedSession+getGitHubAccountconnect_sandbox— fullconnectSandboxcall (covers VM provision + clone + git config)update_session_state— DB write of new sandbox stateinstall_global_skills— only when the session hasglobalSkillRefshandler_total— full request duration[sandbox-timing]—packages/sandbox/vercel/sandbox.tsvm_provision(withmode: base_snapshot|sdk_git|restore_snapshot|empty) — Firecracker bootstrapgit_clone(withrepo) — manualgit cloneinto/vercel/sandboxgit_remote_set_url— embedding the auth token in the origin remotegit_config_user—git config user.name/emailgit_checkout_new_branch(withbranch) — whenisNewBranchis setafter_start_hook— only when anafterStarthook is registeredcreate_total— fullVercelSandbox.createdurationEach log line includes the sandbox
name(= session id) so you can filter to a single create.How to read the logs
After deploying this branch, click an org once. In the Vercel runtime logs for
POST /api/sandbox, grepsandbox-timing\\|sandbox-handler-timingand you'll see something like:The
step_msvalues tell you exactly where the time is going.Test plan
sandbox-timingandsandbox-handler-timinggit_cloneis in fact dominant (hypothesis), or whethervm_provisionis comparable🤖 Generated with Claude Code
Summary by cubic
Adds structured per-step timing logs for sandbox startup to pinpoint the 60s+ bottleneck. Makes the slow phase obvious in Vercel logs so we can pick the right fix.
[sandbox-handler-timing]inapps/web/lib/sandbox/create-sandbox-handler.ts: logs auth/DB lookups,connectSandbox, session state update, global skill install, total. IncludessessionId,step_ms,elapsed_ms.[sandbox-timing]inpackages/sandbox/vercel/sandbox.ts: logs VM provision (mode:base_snapshot|sdk_git|restore_snapshot|empty), git clone/config steps, optional hooks, total. Includesname,step_ms,elapsed_ms.Written for commit 7983a5d. Summary will update on new commits.
Summary by CodeRabbit