Skip to content

Feat/filesystems sandboxes#17

Open
DIodide wants to merge 13 commits intostagingfrom
feat/filesystems-sandboxes
Open

Feat/filesystems sandboxes#17
DIodide wants to merge 13 commits intostagingfrom
feat/filesystems-sandboxes

Conversation

@DIodide
Copy link
Copy Markdown
Owner

@DIodide DIodide commented Mar 23, 2026

  • Daytona sandbox integration: Full sandbox lifecycle management (create, start, stop,
    snapshot) with Convex persistence and FastAPI service layer

    • Sandbox UI panel: File explorer, code editor with syntax highlighting, integrated
      terminal, git panel, and command input — accessible from the chat interface
    • Sandbox code execution: AI tool calls that run code in sandboxes render inline
      results with file diffs, terminal output, and created files
    • Chat UX improvements: Edit-only-last-message restriction, fixed edit version arrow
      positioning, Enter-to-send in edit textarea, suppressed fade-in animations on
      conversation switches, and sticky-bottom auto-scroll with unpin-on-scroll-up
    • Backend fixes: Fixed editForkAndSend rejecting first-message edits, added userId
      backfill migration, auto-start sleeping sandboxes, GitHub OAuth fixes

    Test plan

    • Create a harness with sandbox enabled, verify sandbox starts and file explorer loads
    • Run code in sandbox via AI, verify inline results render (diffs, terminal output,
      files)
    • Test file explorer: create, rename, delete files/folders; open files in editor
    • Test terminal: run commands, verify output streams correctly
    • Test git panel: view status, stage/unstage, commit
    • Edit the last user message, verify fork creates correctly and version arrows work
    • Verify earlier user messages do NOT show the edit button
    • Switch between conversations, verify no fade-in animation flash
    • During streaming, scroll up — verify auto-scroll stops and scroll-to-bottom button
      appears

DIodide and others added 11 commits March 12, 2026 20:37
…fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves conflicts in chat/index.tsx, harnesses/$harnessId.tsx,
onboarding.tsx, and config.py — combining sandbox/Daytona features
with staging's chat search, fork/edit, selectable MCPs, and URL validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Old messages created before the userId field was added to the messages
table are missing userId, which excludes them from the search index.
This migration copies userId from the parent conversation onto each
message, processing 500 per invocation to stay within Convex limits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace editFork + sendMessage with a single editForkAndSend mutation
  that creates the fork and inserts the edited message in one transaction,
  eliminating the brief flicker where the fork appeared without the message.
- Add isEditSaving ref guard to prevent double-submit when pressing
  Enter/clicking Save twice quickly, which was creating duplicate forks.
- Cap .collect() with .take(8192) in both fork and editForkAndSend to
  avoid hitting Convex's per-query document read limit.
- Add bounds validation on upToMessageCount.
- Add max-depth guard (100) on the ancestor walk to prevent infinite loops
  from data corruption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix editForkAndSend rejecting first message edit (upToMessageCount=0)
- Restrict message editing to only the last user message
- Add Enter-to-send and Shift+Enter-for-newline in edit textarea
- Right-align edit version arrows so they don't shift with message width
- Suppress fade-in animations on conversation switches
- Implement sticky-bottom auto-scroll with wheel/touch unpin detection
- Add scroll-to-bottom button when unpinned

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…andboxes

# Conflicts:
#	apps/web/src/routes/chat/index.tsx
@DIodide DIodide self-assigned this Mar 23, 2026
@DIodide
Copy link
Copy Markdown
Owner Author

DIodide commented Mar 23, 2026

@claude

@claude
Copy link
Copy Markdown

claude bot commented Mar 23, 2026

Claude finished @DIodide's task in 2m 13s —— View job


PR Review: Feat/filesystems sandboxes

  • Read CLAUDE.md for repo guidelines
  • Review backend: Daytona service, sandbox tools, FastAPI routes
  • Review Convex schema and backend functions
  • Review frontend: sandbox panel, file explorer, terminal, git panel
  • Review chat route changes and UX improvements
  • Post comprehensive review feedback

This is a large, well-structured PR adding Daytona sandbox integration. Overall the code is clean and thoughtfully organized. Here are my findings, grouped by severity.


🔴 Critical Issues

1. Missing ownership authorization on all sandbox REST endpoints

packages/fastapi/app/routes/sandbox.py — every endpoint (get_sandbox, delete_sandbox, start_sandbox, stop_sandbox, execute_code, run_command, all file/git operations) only verifies the user is authenticated, but never checks that the authenticated user owns the requested sandbox. Any logged-in user can operate on any sandbox by guessing its ID.

# sandbox.py:65 - only checks auth, not ownership
@router.get("/{sandbox_id}")
async def get_sandbox(
    sandbox_id: str,
    user: dict = Depends(get_current_user),  # ← never checked against sandbox owner
):

Fix: Query Convex to verify sandbox.userId == user["sub"] before delegating to the Daytona service, or use the daytonaSandboxId label (which contains harness_user_id) to validate in the Daytona layer. Fix this →

2. Shell injection risk in setup_git_credentials

packages/fastapi/app/services/daytona_service.py:516 — the OAuth token is interpolated directly into a shell command string. While GitHub tokens are typically alphanumeric, this pattern is fragile and could break or be exploited with unexpected token formats.

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'  # token unescaped
    ...
)

Fix: Write the credentials file via sandbox.fs.upload_file(cred_line.encode(), "/home/daytona/.git-credentials") instead of shell interpolation.


🟡 Medium Issues

3. resource_tier parameter is ignored

packages/fastapi/app/services/daytona_service.py:83-120resource_tier is accepted and logged but the values from RESOURCE_TIERS (cpu, memory, disk) are never passed to CreateSandboxFromSnapshotParams. All tiers create identical sandboxes.

params = CreateSandboxFromSnapshotParams(
    language=language,
    auto_stop_interval=15,
    labels=sandbox_labels,
    ephemeral=ephemeral,
    # ← resource_tier values (cpu, memory, disk) never applied
)

4. Traceback leaked to client in tool error responses

packages/fastapi/app/services/sandbox_tools.py:656-662 — full Python tracebacks are returned to the frontend in error responses. This exposes internal implementation details and file paths.

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← remove from prod responses
})

5. Public GitHub repos blocked without credentials

packages/fastapi/app/services/sandbox_tools.py:543needs_auth = "github.com" in url and not creds blocks cloning of public GitHub repositories when no GitHub OAuth is configured. Users without GitHub OAuth can't clone even github.com/torvalds/linux.

6. Unsafe as any cast in Convex createInternal

packages/convex-backend/convex/sandboxes.ts:173,180harnessId is cast with as any to bypass the type system, which can cause runtime errors if the string isn't a valid Id<"harnesses">.

harnessId: harnessIdStr ? (harnessIdStr as any) : undefined,
// ...
const harness = await ctx.db.get(harnessIdStr as any);

7. In-memory _pending_oauth has no TTL cleanup

packages/fastapi/app/services/mcp_oauth.py:40_pending_oauth is a dict with no eviction. Long-running servers will accumulate stale PKCE states from abandoned flows. Each PendingOAuth has a created_at field but there's no background cleanup job.

8. JWT audience validation disabled in terminal WebSocket

packages/fastapi/app/routes/terminal.py:49options={"verify_aud": False} skips audience validation on the Clerk JWT. This means tokens intended for other services are accepted. Should validate aud matches the expected Clerk audience.


🟢 Minor / Style

9. import base64 inside function body

packages/fastapi/app/services/sandbox_tools.py:444 — inline import inside the read_file handler. Move to top of file.

10. _ensure_running called on every Daytona operation

Every filesystem, git, and command operation calls _ensure_running(sandbox_id) which always does a client.get(sandbox_id) round-trip. For the UI file explorer this means each keypress/navigation triggers a Daytona API call. Consider a short-lived TTL cache (e.g. 30s) for sandbox state.

11. No file size guard on read_file

packages/fastapi/app/routes/sandbox.py:195-208 — there's no max size check before loading a file's contents into memory. A large binary file in the sandbox could OOM the server. Consider adding a size check before service.read_file().

12. Stale comment in chat/index.tsx

apps/web/src/routes/chat/index.tsx:12 — inline comment // any downsides to mixing convex/react with tanstack/react-query? should be removed from production code.

13. Double blank line in sandbox.py

packages/fastapi/app/routes/sandbox.py:318-319 — extra blank line between search_files and git_status endpoints.


Positive Notes

  • The agentic loop sandbox tool injection pattern is clean — tools are only added to the LLM's context when sandbox_enabled && daytona_api_key are both set.
  • The WebSocket terminal proxy correctly handles bidirectional streaming with asyncio.gather and graceful cleanup.
  • Convex sandboxes table has proper indexes for all access patterns (by_user, by_harness, by_daytona_id).
  • Git credentials are set up via credential store rather than per-command auth, which means run_command git operations also authenticate correctly.
  • Lazy-loading xterm.js in the terminal component is a good bundle optimization.
  • The _ensure_running auto-start behavior is a nice UX touch for sleeping sandboxes.

- Introduced a new internal query `getOwnerByDaytonaId` to check sandbox ownership in Convex.
- Added `_assert_sandbox_owner` function in FastAPI routes to enforce ownership checks for sandbox operations.
- Updated multiple routes to include ownership verification before executing actions on sandboxes.
- Improved DaytonaService to manage sandbox lifecycle with a caching mechanism for running sandboxes.
- Cleaned up imports and adjusted comments for clarity.

This commit enhances security by ensuring only authorized users can access their respective sandboxes.
@DIodide
Copy link
Copy Markdown
Owner Author

DIodide commented Mar 23, 2026

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 23, 2026

Claude finished @DIodide's task in 2m 34s —— View job


PR Review: Feat/filesystems sandboxes

  • Read CLAUDE.md for repo guidelines
  • Review backend: Daytona service, sandbox tools, FastAPI routes
  • Review Convex schema and backend functions
  • Review frontend: sandbox panel, file explorer, terminal, git panel
  • Review chat route changes and UX improvements
  • Post comprehensive review feedback

This PR has been significantly improved since the initial review — most of the previously flagged issues have been addressed. Here's the updated assessment.


✅ Previously Flagged Issues: Now Fixed

Issue Status
Missing ownership auth on all sandbox REST endpoints Fixed — _assert_sandbox_owner added and called everywhere
resource_tier parameter ignored Fixed — TIER_SNAPSHOTS now maps tiers to different Daytona snapshot names
Public GitHub repos blocked without credentials Fixed — unauthenticated clone attempted first, error only on failure
_pending_oauth has no TTL cleanup Fixed — _cleanup_pending() called after each flow start (10 min eviction)
import base64 inside function body Fixed — moved to top of file
_ensure_running round-trip on every operation Fixed — 30-second TTL cache implemented in DaytonaService._running_cache
as any cast in Convex createInternal Fixed — harnessId now properly typed as v.optional(v.id("harnesses"))
Stale comment in chat/index.tsx Fixed — removed

🔴 Still Open: Critical

1. Shell injection in setup_git_credentials (daytona_service.py:548)

The OAuth token is still interpolated directly into a shell command string:

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'
    # ↑ token is unquoted inside double-quotes — a token with `"` or `$()` breaks this
)

GitHub tokens are currently alphanumeric so this doesn't exploit in practice, but it's one format change away from being a real injection. Fix by using the filesystem API instead of shell: Fix this →


🟡 Still Open: Medium

2. JWT audience validation skipped in terminal WebSocket (terminal.py:50)

payload = jwt.decode(
    token, key, algorithms=["RS256"], options={"verify_aud": False}
)

Tokens issued for other services are accepted. The terminal connects a live PTY — this is worth fixing. The expected audience is typically the Clerk client ID or a configured azp. Fix this →

3. Traceback leaked in agentic tool errors (sandbox_tools.py:665)

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← returned to the LLM context and potentially the frontend
})

This exposes internal file paths and implementation details. The traceback is useful for debugging but should be logged server-side only, not returned in the payload. Fix this →

4. verify_sandbox_owner fails open without Convex (convex.py:126-128)

if not settings.convex_url or not settings.convex_deploy_key:
    logger.warning("Convex not configured — skipping ownership check")
    return True  # ← any user can access any sandbox in this config

This is explicitly a dev convenience, but it means a misconfigured production deployment would have no ownership enforcement on any endpoint. Consider raising an exception or returning False (deny) instead, and documenting that Convex is required for production security.

5. downloadUrl leaks unauthenticated file access (sandbox-api.ts:167)

downloadUrl(sandboxId: string, path: string) {
    return `${API_URL}/api/sandbox/${sandboxId}/files/download?path=...`;
    // ↑ no auth token — used directly as <img src> or <a href>
}

The backend /download endpoint requires a Bearer token, but constructing a raw URL and passing it to <img src> or a link doesn't attach the auth header — the request will 401. If the image rendering in FileViewer or SandboxResult uses this URL directly in an <img> tag, images will fail silently for authenticated users.

6. git_pull always requires credentials, inconsistent with git_clone (sandbox_tools.py:611)

git_push requiring auth makes sense. But git_pull also blocks on missing credentials, even for public repos that don't need auth — unlike git_clone which tries unauthenticated first. This means the AI can clone a public repo but can't pull updates from it without GitHub OAuth configured.


🟢 Minor

7. asyncio.get_event_loop() deprecated (terminal.py:96)

loop = asyncio.get_event_loop()

In Python 3.10+ this emits a deprecation warning inside an async context. Use asyncio.get_running_loop() instead.

8. Private method access in route handler (sandbox.py:256)

sandbox = service._ensure_running(sandbox_id)  # ← directly accessing private method
data = sandbox.fs.download_file(path)

The download_file route bypasses the service's public read_file API to access the raw Daytona fs.download_file(). Consider adding a download_file_bytes(sandbox_id, path) -> bytes method to DaytonaService so the route layer stays clean.

9. git_log shell parsing is fragile (daytona_service.py:504-518)

The log format %H|%s|%an|%aI split on | breaks silently if a commit subject contains a pipe character. Use %x00 (NUL) as a delimiter instead, or use Daytona's git SDK if it exposes log natively.

10. No file size guard on read_file (sandbox.py:212-226)

No max-size check before loading a file into memory. A large binary in the sandbox could OOM the server. A simple check like rejecting files over 10MB before calling service.read_file() would prevent this.


Summary

The backend security posture is substantially improved from the initial version. The two remaining issues I'd prioritize before merge are the shell injection in credential setup (#1) and the traceback in error responses (#3). The JWT audience issue (#2) is worth fixing before this is publicly deployed.

@DIodide
Copy link
Copy Markdown
Owner Author

DIodide commented Mar 23, 2026

@claude review this

@claude
Copy link
Copy Markdown

claude bot commented Mar 23, 2026

Claude finished @DIodide's task in 2m 49s —— View job


PR Review: Feat/filesystems sandboxes

  • Read CLAUDE.md for repo guidelines
  • Check status of previously flagged critical issues
  • Review remaining open issues from prior reviews
  • Scan for any new issues
  • Post comprehensive review

This is the third review of this PR. The codebase is in good shape overall — the major ownership/auth rework from the first two reviews is solid. Here's the current state of all open issues.


✅ Resolved Since Last Review

  • downloadUrl in sandbox-api.ts:167 — the function is defined but never called anywhere. file-viewer.tsx creates a local blob URL instead of using this. Not an active security hole (the endpoint still requires Bearer auth), just dead code.

🔴 Critical (Still Open)

1. Shell injection in setup_git_credentials (daytona_service.py:548)

The OAuth token is still interpolated directly into a shell double-quoted string:

cred_line = f"https://{username}:{token}@github.com"
sandbox.process.exec(
    f'printf "%s\\n" "{cred_line}" > /home/daytona/.git-credentials'
)

A token containing ", $(...), or backticks breaks or exploits this. GitHub's current token format is alphanumeric + _ so this isn't exploitable today, but it's one format change away from being real. Fix by writing via the filesystem API instead:

cred_line = f"https://{username}:{token}@github.com\n"
sandbox.fs.upload_file(cred_line.encode(), "/home/daytona/.git-credentials")
sandbox.process.exec("chmod 600 /home/daytona/.git-credentials && git config --global credential.helper store")

Fix this →


🟡 Medium (Still Open)

2. Traceback leaked in agentic tool errors (sandbox_tools.py:665)

return json.dumps({
    "type": "error",
    "message": str(e),
    "traceback": traceback.format_exc(),  # ← returned to LLM context and frontend
})

The traceback is already logged server-side at line 661 via logger.error. Remove the "traceback" key from the returned dict — it exposes internal file paths and implementation details to the LLM and potentially the user's browser. Fix this →

3. JWT audience validation disabled in terminal WebSocket (terminal.py:50)

payload = jwt.decode(
    token, key, algorithms=["RS256"], options={"verify_aud": False}
)

This accepts tokens issued for any service as valid terminal auth. The terminal connects a live PTY — worth enforcing that tokens are actually issued for this application. At minimum, validate the iss claim against the expected Clerk domain (this is already fetched from the token). Fix this →

4. verify_sandbox_owner fails open without Convex (convex.py:126-128)

if not settings.convex_url or not settings.convex_deploy_key:
    logger.warning("Convex not configured — skipping ownership check")
    return True  # ← any user can access any sandbox

A misconfigured production deployment silently disables all ownership enforcement. Consider raising a RuntimeError here instead — Convex is a hard dependency for production security. The current behavior is a footgun. Fix this →

5. Cache-Control: public on authenticated download endpoint (sandbox.py:264) — New

return Response(
    content=data if isinstance(data, bytes) else data.encode(),
    media_type=content_type,
    headers={"Cache-Control": "public, max-age=300"},  # ← wrong for auth'd content
)

public tells intermediate proxies and CDNs that the response is safe to cache and share between users. Since this endpoint requires a Bearer token and returns per-user sandbox files, it should be private, max-age=300. Fix this →

6. git_pull blocks public repos (sandbox_tools.py:611)

git_pull hard-blocks on missing credentials even for public repos, while git_clone tries unauthenticated first. This means the AI can clone a public repo but can't pull updates from it without GitHub OAuth. Should mirror the git_clone pattern: try unauthenticated first, then return a meaningful error only if that fails for a GitHub URL. Fix this →


🟢 Minor (Still Open)

7. asyncio.get_event_loop() deprecated (terminal.py:96)

loop = asyncio.get_event_loop() emits a deprecation warning in Python 3.10+ inside async contexts. Use asyncio.get_running_loop() instead. Fix this →

8. Private method access in route handler (sandbox.py:256)

service._ensure_running(sandbox_id) is called directly in the download_file route, bypassing the service layer. Consider a download_file_bytes(sandbox_id, path) -> bytes method on DaytonaService. Fix this →

9. git_log pipe-delimiter is fragile (daytona_service.py:503-518)

Format %H|%s|%an|%aI split on | silently truncates commit subjects containing pipe characters. Use %x00 (NUL) as delimiter instead.

10. No file size guard on read_file (sandbox.py:212-226)

No max-size check before loading a file into memory. A large binary in the sandbox could OOM the server. A simple rejection of files over ~10MB before service.read_file() would be sufficient.

11. Dead code: downloadUrl (sandbox-api.ts:167)

The function is never called — all components use readFile + blob URL. Remove to avoid confusion. Fix this →


Summary

Before merge I'd prioritize: #1 (shell injection), #2 (traceback leak), and #5 (public cache header on auth'd endpoint) as these are straightforward one-line fixes with clear security implications. The JWT audience issue (#3) and fail-open Convex check (#4) are worth addressing before public rollout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant