Feat/filesystems sandboxes by DIodide · Pull Request #17 · DIodide/Harness

DIodide · 2026-03-23T17:06:01Z

Daytona sandbox integration: Full sandbox lifecycle management (create, start, stop,
snapshot) with Convex persistence and FastAPI service layer
- Sandbox UI panel: File explorer, code editor with syntax highlighting, integrated
  terminal, git panel, and command input — accessible from the chat interface
- Sandbox code execution: AI tool calls that run code in sandboxes render inline
  results with file diffs, terminal output, and created files
- Chat UX improvements: Edit-only-last-message restriction, fixed edit version arrow
  positioning, Enter-to-send in edit textarea, suppressed fade-in animations on
  conversation switches, and sticky-bottom auto-scroll with unpin-on-scroll-up
- Backend fixes: Fixed editForkAndSend rejecting first-message edits, added userId
  backfill migration, auto-start sleeping sandboxes, GitHub OAuth fixes
Test plan
- Create a harness with sandbox enabled, verify sandbox starts and file explorer loads
- Run code in sandbox via AI, verify inline results render (diffs, terminal output,
  files)
- Test file explorer: create, rename, delete files/folders; open files in editor
- Test terminal: run commands, verify output streams correctly
- Test git panel: view status, stage/unstage, commit
- Edit the last user message, verify fork creates correctly and version arrows work
- Verify earlier user messages do NOT show the edit button
- Switch between conversations, verify no fade-in animation flash
- During streaming, scroll up — verify auto-scroll stops and scroll-to-bottom button
  appears

…ke the rest)

…fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolves conflicts in chat/index.tsx, harnesses/$harnessId.tsx, onboarding.tsx, and config.py — combining sandbox/Daytona features with staging's chat search, fork/edit, selectable MCPs, and URL validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Old messages created before the userId field was added to the messages table are missing userId, which excludes them from the search index. This migration copies userId from the parent conversation onto each message, processing 500 per invocation to stay within Convex limits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace editFork + sendMessage with a single editForkAndSend mutation that creates the fork and inserts the edited message in one transaction, eliminating the brief flicker where the fork appeared without the message. - Add isEditSaving ref guard to prevent double-submit when pressing Enter/clicking Save twice quickly, which was creating duplicate forks. - Cap .collect() with .take(8192) in both fork and editForkAndSend to avoid hitting Convex's per-query document read limit. - Add bounds validation on upToMessageCount. - Add max-depth guard (100) on the ancestor walk to prevent infinite loops from data corruption. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix editForkAndSend rejecting first message edit (upToMessageCount=0) - Restrict message editing to only the last user message - Add Enter-to-send and Shift+Enter-for-newline in edit textarea - Right-align edit version arrows so they don't shift with message width - Suppress fade-in animations on conversation switches - Implement sticky-bottom auto-scroll with wheel/touch unpin detection - Add scroll-to-bottom button when unpinned Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…andboxes # Conflicts: # apps/web/src/routes/chat/index.tsx

DIodide · 2026-03-23T17:06:32Z

@claude

claude · 2026-03-23T17:06:46Z

Claude finished @DIodide's task in 2m 13s —— View job

PR Review: Feat/filesystems sandboxes

Read CLAUDE.md for repo guidelines
Review backend: Daytona service, sandbox tools, FastAPI routes
Review Convex schema and backend functions
Review frontend: sandbox panel, file explorer, terminal, git panel
Review chat route changes and UX improvements
Post comprehensive review feedback

This is a large, well-structured PR adding Daytona sandbox integration. Overall the code is clean and thoughtfully organized. Here are my findings, grouped by severity.

🔴 Critical Issues

1. Missing ownership authorization on all sandbox REST endpoints

packages/fastapi/app/routes/sandbox.py — every endpoint (get_sandbox, delete_sandbox, start_sandbox, stop_sandbox, execute_code, run_command, all file/git operations) only verifies the user is authenticated, but never checks that the authenticated user owns the requested sandbox. Any logged-in user can operate on any sandbox by guessing its ID.

# sandbox.py:65 - only checks auth, not ownership
@router.get("/{sandbox_id}")
async def get_sandbox(
    sandbox_id: str,
    user: dict = Depends(get_current_user),  # ← never checked against sandbox owner
):

Fix: Query Convex to verify sandbox.userId == user["sub"] before delegating to the Daytona service, or use the daytonaSandboxId label (which contains harness_user_id) to validate in the Daytona layer. Fix this →

2. Shell injection risk in setup_git_credentials

packages/fastapi/app/services/daytona_service.py:516 — the OAuth token is interpolated directly into a shell command string. While GitHub tokens are typically alphanumeric, this pattern is fragile and could break or be exploited with unexpected token formats.

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'  # token unescaped
    ...
)

Fix: Write the credentials file via sandbox.fs.upload_file(cred_line.encode(), "/home/daytona/.git-credentials") instead of shell interpolation.

🟡 Medium Issues

3. resource_tier parameter is ignored

packages/fastapi/app/services/daytona_service.py:83-120 — resource_tier is accepted and logged but the values from RESOURCE_TIERS (cpu, memory, disk) are never passed to CreateSandboxFromSnapshotParams. All tiers create identical sandboxes.

params = CreateSandboxFromSnapshotParams(
    language=language,
    auto_stop_interval=15,
    labels=sandbox_labels,
    ephemeral=ephemeral,
    # ← resource_tier values (cpu, memory, disk) never applied
)

4. Traceback leaked to client in tool error responses

packages/fastapi/app/services/sandbox_tools.py:656-662 — full Python tracebacks are returned to the frontend in error responses. This exposes internal implementation details and file paths.

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← remove from prod responses
})

5. Public GitHub repos blocked without credentials

packages/fastapi/app/services/sandbox_tools.py:543 — needs_auth = "github.com" in url and not creds blocks cloning of public GitHub repositories when no GitHub OAuth is configured. Users without GitHub OAuth can't clone even github.com/torvalds/linux.

6. Unsafe as any cast in Convex createInternal

packages/convex-backend/convex/sandboxes.ts:173,180 — harnessId is cast with as any to bypass the type system, which can cause runtime errors if the string isn't a valid Id<"harnesses">.

harnessId: harnessIdStr ? (harnessIdStr as any) : undefined,
// ...
const harness = await ctx.db.get(harnessIdStr as any);

7. In-memory _pending_oauth has no TTL cleanup

packages/fastapi/app/services/mcp_oauth.py:40 — _pending_oauth is a dict with no eviction. Long-running servers will accumulate stale PKCE states from abandoned flows. Each PendingOAuth has a created_at field but there's no background cleanup job.

8. JWT audience validation disabled in terminal WebSocket

packages/fastapi/app/routes/terminal.py:49 — options={"verify_aud": False} skips audience validation on the Clerk JWT. This means tokens intended for other services are accepted. Should validate aud matches the expected Clerk audience.

🟢 Minor / Style

9. import base64 inside function body

packages/fastapi/app/services/sandbox_tools.py:444 — inline import inside the read_file handler. Move to top of file.

10. _ensure_running called on every Daytona operation

Every filesystem, git, and command operation calls _ensure_running(sandbox_id) which always does a client.get(sandbox_id) round-trip. For the UI file explorer this means each keypress/navigation triggers a Daytona API call. Consider a short-lived TTL cache (e.g. 30s) for sandbox state.

11. No file size guard on read_file

packages/fastapi/app/routes/sandbox.py:195-208 — there's no max size check before loading a file's contents into memory. A large binary file in the sandbox could OOM the server. Consider adding a size check before service.read_file().

12. Stale comment in chat/index.tsx

apps/web/src/routes/chat/index.tsx:12 — inline comment // any downsides to mixing convex/react with tanstack/react-query? should be removed from production code.

13. Double blank line in sandbox.py

packages/fastapi/app/routes/sandbox.py:318-319 — extra blank line between search_files and git_status endpoints.

Positive Notes

The agentic loop sandbox tool injection pattern is clean — tools are only added to the LLM's context when sandbox_enabled && daytona_api_key are both set.
The WebSocket terminal proxy correctly handles bidirectional streaming with asyncio.gather and graceful cleanup.
Convex sandboxes table has proper indexes for all access patterns (by_user, by_harness, by_daytona_id).
Git credentials are set up via credential store rather than per-command auth, which means run_command git operations also authenticate correctly.
Lazy-loading xterm.js in the terminal component is a good bundle optimization.
The _ensure_running auto-start behavior is a nice UX touch for sleeping sandboxes.

- Introduced a new internal query `getOwnerByDaytonaId` to check sandbox ownership in Convex. - Added `_assert_sandbox_owner` function in FastAPI routes to enforce ownership checks for sandbox operations. - Updated multiple routes to include ownership verification before executing actions on sandboxes. - Improved DaytonaService to manage sandbox lifecycle with a caching mechanism for running sandboxes. - Cleaned up imports and adjusted comments for clarity. This commit enhances security by ensuring only authorized users can access their respective sandboxes.

DIodide · 2026-03-23T19:47:51Z

@claude review

claude · 2026-03-23T19:48:06Z

Claude finished @DIodide's task in 2m 34s —— View job

PR Review: Feat/filesystems sandboxes

Read CLAUDE.md for repo guidelines
Review backend: Daytona service, sandbox tools, FastAPI routes
Review Convex schema and backend functions
Review frontend: sandbox panel, file explorer, terminal, git panel
Review chat route changes and UX improvements
Post comprehensive review feedback

This PR has been significantly improved since the initial review — most of the previously flagged issues have been addressed. Here's the updated assessment.

✅ Previously Flagged Issues: Now Fixed

Issue	Status
Missing ownership auth on all sandbox REST endpoints	Fixed — `_assert_sandbox_owner` added and called everywhere
`resource_tier` parameter ignored	Fixed — `TIER_SNAPSHOTS` now maps tiers to different Daytona snapshot names
Public GitHub repos blocked without credentials	Fixed — unauthenticated clone attempted first, error only on failure
`_pending_oauth` has no TTL cleanup	Fixed — `_cleanup_pending()` called after each flow start (10 min eviction)
`import base64` inside function body	Fixed — moved to top of file
`_ensure_running` round-trip on every operation	Fixed — 30-second TTL cache implemented in `DaytonaService._running_cache`
`as any` cast in Convex `createInternal`	Fixed — `harnessId` now properly typed as `v.optional(v.id("harnesses"))`
Stale comment in `chat/index.tsx`	Fixed — removed

🔴 Still Open: Critical

1. Shell injection in setup_git_credentials (daytona_service.py:548)

The OAuth token is still interpolated directly into a shell command string:

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'
    # ↑ token is unquoted inside double-quotes — a token with `"` or `$()` breaks this
)

GitHub tokens are currently alphanumeric so this doesn't exploit in practice, but it's one format change away from being a real injection. Fix by using the filesystem API instead of shell: Fix this →

🟡 Still Open: Medium

2. JWT audience validation skipped in terminal WebSocket (terminal.py:50)

payload = jwt.decode(
    token, key, algorithms=["RS256"], options={"verify_aud": False}
)

Tokens issued for other services are accepted. The terminal connects a live PTY — this is worth fixing. The expected audience is typically the Clerk client ID or a configured azp. Fix this →

3. Traceback leaked in agentic tool errors (sandbox_tools.py:665)

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← returned to the LLM context and potentially the frontend
})

This exposes internal file paths and implementation details. The traceback is useful for debugging but should be logged server-side only, not returned in the payload. Fix this →

4. verify_sandbox_owner fails open without Convex (convex.py:126-128)

if not settings.convex_url or not settings.convex_deploy_key:
    logger.warning("Convex not configured — skipping ownership check")
    return True  # ← any user can access any sandbox in this config

This is explicitly a dev convenience, but it means a misconfigured production deployment would have no ownership enforcement on any endpoint. Consider raising an exception or returning False (deny) instead, and documenting that Convex is required for production security.

5. downloadUrl leaks unauthenticated file access (sandbox-api.ts:167)

downloadUrl(sandboxId: string, path: string) {
    return `${API_URL}/api/sandbox/${sandboxId}/files/download?path=...`;
    // ↑ no auth token — used directly as <img src> or <a href>
}

The backend /download endpoint requires a Bearer token, but constructing a raw URL and passing it to <img src> or a link doesn't attach the auth header — the request will 401. If the image rendering in FileViewer or SandboxResult uses this URL directly in an <img> tag, images will fail silently for authenticated users.

6. git_pull always requires credentials, inconsistent with git_clone (sandbox_tools.py:611)

git_push requiring auth makes sense. But git_pull also blocks on missing credentials, even for public repos that don't need auth — unlike git_clone which tries unauthenticated first. This means the AI can clone a public repo but can't pull updates from it without GitHub OAuth configured.

🟢 Minor

7. asyncio.get_event_loop() deprecated (terminal.py:96)

loop = asyncio.get_event_loop()

In Python 3.10+ this emits a deprecation warning inside an async context. Use asyncio.get_running_loop() instead.

8. Private method access in route handler (sandbox.py:256)

sandbox = service._ensure_running(sandbox_id)  # ← directly accessing private method
data = sandbox.fs.download_file(path)

The download_file route bypasses the service's public read_file API to access the raw Daytona fs.download_file(). Consider adding a download_file_bytes(sandbox_id, path) -> bytes method to DaytonaService so the route layer stays clean.

9. git_log shell parsing is fragile (daytona_service.py:504-518)

The log format %H|%s|%an|%aI split on | breaks silently if a commit subject contains a pipe character. Use %x00 (NUL) as a delimiter instead, or use Daytona's git SDK if it exposes log natively.

10. No file size guard on read_file (sandbox.py:212-226)

No max-size check before loading a file into memory. A large binary in the sandbox could OOM the server. A simple check like rejecting files over 10MB before calling service.read_file() would prevent this.

Summary

The backend security posture is substantially improved from the initial version. The two remaining issues I'd prioritize before merge are the shell injection in credential setup (#1) and the traceback in error responses (#3). The JWT audience issue (#2) is worth fixing before this is publicly deployed.

DIodide · 2026-03-23T19:59:12Z

@claude review this

claude · 2026-03-23T19:59:24Z

Claude finished @DIodide's task in 2m 49s —— View job

PR Review: Feat/filesystems sandboxes

Read CLAUDE.md for repo guidelines
Check status of previously flagged critical issues
Review remaining open issues from prior reviews
Scan for any new issues
Post comprehensive review

This is the third review of this PR. The codebase is in good shape overall — the major ownership/auth rework from the first two reviews is solid. Here's the current state of all open issues.

✅ Resolved Since Last Review

downloadUrl in sandbox-api.ts:167 — the function is defined but never called anywhere. file-viewer.tsx creates a local blob URL instead of using this. Not an active security hole (the endpoint still requires Bearer auth), just dead code.

🔴 Critical (Still Open)

1. Shell injection in setup_git_credentials (daytona_service.py:548)

The OAuth token is still interpolated directly into a shell double-quoted string:

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'
)

A token containing ", $(...), or backticks breaks or exploits this. GitHub's current token format is alphanumeric + _ so this isn't exploitable today, but it's one format change away from being real. Fix by writing via the filesystem API instead:

cred_line = f"https://{username}:{token}@github.com\n"
sandbox.fs.upload_file(cred_line.encode(), "/home/daytona/.git-credentials")
sandbox.process.exec("chmod 600 /home/daytona/.git-credentials && git config --global credential.helper store")

Fix this →

🟡 Medium (Still Open)

2. Traceback leaked in agentic tool errors (sandbox_tools.py:665)

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← returned to LLM context and frontend
})

The traceback is already logged server-side at line 661 via logger.error. Remove the "traceback" key from the returned dict — it exposes internal file paths and implementation details to the LLM and potentially the user's browser. Fix this →

3. JWT audience validation disabled in terminal WebSocket (terminal.py:50)

payload = jwt.decode(
    token, key, algorithms=["RS256"], options={"verify_aud": False}
)

This accepts tokens issued for any service as valid terminal auth. The terminal connects a live PTY — worth enforcing that tokens are actually issued for this application. At minimum, validate the iss claim against the expected Clerk domain (this is already fetched from the token). Fix this →

4. verify_sandbox_owner fails open without Convex (convex.py:126-128)

if not settings.convex_url or not settings.convex_deploy_key:
    logger.warning("Convex not configured — skipping ownership check")
    return True  # ← any user can access any sandbox

A misconfigured production deployment silently disables all ownership enforcement. Consider raising a RuntimeError here instead — Convex is a hard dependency for production security. The current behavior is a footgun. Fix this →

5. Cache-Control: public on authenticated download endpoint (sandbox.py:264) — New

return Response(
    content=data if isinstance(data, bytes) else data.encode(),
    media_type=content_type,
    headers={"Cache-Control": "public, max-age=300"},  # ← wrong for auth'd content
)

public tells intermediate proxies and CDNs that the response is safe to cache and share between users. Since this endpoint requires a Bearer token and returns per-user sandbox files, it should be private, max-age=300. Fix this →

6. git_pull blocks public repos (sandbox_tools.py:611)

git_pull hard-blocks on missing credentials even for public repos, while git_clone tries unauthenticated first. This means the AI can clone a public repo but can't pull updates from it without GitHub OAuth. Should mirror the git_clone pattern: try unauthenticated first, then return a meaningful error only if that fails for a GitHub URL. Fix this →

🟢 Minor (Still Open)

7. asyncio.get_event_loop() deprecated (terminal.py:96)

loop = asyncio.get_event_loop() emits a deprecation warning in Python 3.10+ inside async contexts. Use asyncio.get_running_loop() instead. Fix this →

8. Private method access in route handler (sandbox.py:256)

service._ensure_running(sandbox_id) is called directly in the download_file route, bypassing the service layer. Consider a download_file_bytes(sandbox_id, path) -> bytes method on DaytonaService. Fix this →

9. git_log pipe-delimiter is fragile (daytona_service.py:503-518)

Format %H|%s|%an|%aI split on | silently truncates commit subjects containing pipe characters. Use %x00 (NUL) as delimiter instead.

10. No file size guard on read_file (sandbox.py:212-226)

No max-size check before loading a file into memory. A large binary in the sandbox could OOM the server. A simple rejection of files over ~10MB before service.read_file() would be sufficient.

11. Dead code: downloadUrl (sandbox-api.ts:167)

The function is never called — all components use readFile + blob URL. Remove to avoid confusion. Fix this →

Summary

Before merge I'd prioritize: #1 (shell injection), #2 (traceback leak), and #5 (public cache header on auth'd endpoint) as these are straightforward one-line fixes with clear security implications. The JWT audience issue (#3) and fail-open Convex check (#4) are worth addressing before public rollout.

DIodide and others added 11 commits March 12, 2026 20:37

ui for sandbox results: frontend impl + logic (send sandbox config li…

e03c1ec

…ke the rest)

sandboxes convex impl

451389a

fastapi sandboxes (daytona) impl

cbb39cf

add old plans to gitignore

008c58c

auto start sleeping sandboxes + fix github auth

dd039e2

sandbox panel: file explorer, editor, terminal, git, and code review …

b206e26

…fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/staging' into feat/filesystems-s…

4bdb7a6

…andboxes # Conflicts: # apps/web/src/routes/chat/index.tsx

DIodide requested review from abubahmed, cole-ramer, jon3350 and richardw19 March 23, 2026 17:06

DIodide self-assigned this Mar 23, 2026

bug fixes

d4d9765

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/filesystems sandboxes#17

Feat/filesystems sandboxes#17
DIodide wants to merge 13 commits intostagingfrom
feat/filesystems-sandboxes

DIodide commented Mar 23, 2026

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DIodide commented Mar 23, 2026

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feat/filesystems sandboxes

🔴 Critical Issues

🟡 Medium Issues

🟢 Minor / Style

Positive Notes

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feat/filesystems sandboxes

✅ Previously Flagged Issues: Now Fixed

🔴 Still Open: Critical

🟡 Still Open: Medium

🟢 Minor

Summary

Uh oh!

DIodide commented Mar 23, 2026

Uh oh!

claude bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feat/filesystems sandboxes

✅ Resolved Since Last Review

🔴 Critical (Still Open)

🟡 Medium (Still Open)

🟢 Minor (Still Open)

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 23, 2026 •

edited

Loading

claude bot commented Mar 23, 2026 •

edited

Loading

claude bot commented Mar 23, 2026 •

edited

Loading