Skip to content

fix(security): refuse to follow symlinks that escape the project root in MCP source-returning reads#349

Open
casanel-net wants to merge 1 commit into
colbymchenry:mainfrom
casanel-net:fix/security/mcp-read-realpath-symlink-escape
Open

fix(security): refuse to follow symlinks that escape the project root in MCP source-returning reads#349
casanel-net wants to merge 1 commit into
colbymchenry:mainfrom
casanel-net:fix/security/mcp-read-realpath-symlink-escape

Conversation

@casanel-net
Copy link
Copy Markdown

Summary

Two MCP source-returning handlers re-read each matched file via fs.readFileSync(absPath) after only a logical path-prefix check (validatePathWithinRoot). An attacker-planted in-tree symlink at a logically-in-tree path passes that check; the read follows the link and the project-external file's contents reach the agent. This PR adds a realpath containment check at both read sites using the existing isPathWithinRootReal helper.

Same CWE-59 class as the #280 session-marker fix (different mechanic — that one uses O_NOFOLLOW at write time; this one uses realpath comparison at read time).

Threat

An attacker who can plant a file in the indexed tree (compromised PR, prompt-injected agent in the running session, hostile dependency dropping files outside node_modules/) can drop something like evil.ts -> ../../.env — a logically-in-tree path that resolves outside the project. The two read sites this hits:

  • src/context/index.ts:933 extractNodeCodecodegraph_context source emission. validatePathWithinRoot returns the resolved logical path; fs.readFileSync(filePath) then follows the symlink.
  • src/mcp/tools.ts:1112-1117codegraph_explore source-emission loop. Same shape.

In both cases the file's contents end up in the tool result that goes back to the calling agent — the high-bandwidth exfiltration channel.

The existing isPathWithinRoot / validatePathWithinRoot helpers compare logical path prefixes only. isPathWithinRootReal in src/utils.ts:132 already does the realpath comparison — it just wasn't being called from these two read sites.

Change

Two-site fix, ~10 lines of actual code:

  1. Import isPathWithinRootReal in src/context/index.ts and src/mcp/tools.ts.
  2. Extend the existing validatePathWithinRoot + existsSync guard at both read sites with || !isPathWithinRootReal(filePath, projectRoot).
  3. logDebug breadcrumb at the context site so security skips are diagnosable without leaking contents.

Comment at each site explains the threat and references the #280 fix as the same CWE-59 class.

Tests

Two regression tests under a new codegraph_context source-emission symlink escape resistance describe block in __tests__/security.test.ts, one per handler.

Fixture shape:

tempdir/projectA/   ← "project"
  a.ts                                (benign in-tree source)
  evil.ts → tempdir/projectB/secret.ts (attacker-planted escape symlink)

tempdir/projectB/   ← "outside the project"
  secret.ts                            (target the attacker wants exfiltrated)

The sentinel lives in a function BODY (not a const X = '...' signature) so the negative assertion isolates the read-time leak from any index-time symbol-signature DB metadata exposure. Positive controls assert the symbol/file IS referenced in the response (otherwise the negative could pass trivially because the search missed).

Tests skip on platforms where the user can't create symlinks (Windows without dev mode / admin), matching the convention used by the existing session-marker symlink tests.

Verified locally:

$ npx vitest run __tests__/security.test.ts
 Test Files  1 passed (1)
      Tests  38 passed | 2 skipped (40)   ← before this PR
      Tests  39 passed | 2 skipped (41)   ← after (1 added context test; explore test also added, paired skip on Windows)

npx tsc --noEmit clean.

Known residual risk (out of scope for this PR — flagging for separate work)

The walker fix only closes the read-time exfil at these two handlers. There are other read paths that still follow symlinks today and could expose external file content via different surfaces:

  • getGitVisibleFiles (git ls-files -o) feeds untracked symlinks straight to extraction parsing; symbol signatures from external content can land in the DB.
  • indexFile(), sync() / getChangedFiles() git fast paths, and the framework-detection wrappers around src/extraction/index.ts:493-503 all read via path.join(rootDir, relPath) after only logical validation.
  • isPathWithinRootReal returns true on realpath failure (src/utils.ts:143, fail-open). The TOCTOU window between realpathSync and readFileSync is small but non-zero; an attacker racing the symlink target could in theory slip through.
  • Index-time metadata (symbol signatures, snippets) extracted from a pre-existing planted symlink remains in the DB after this fix and can leak through other surfaces (e.g. codegraph_search) until the user re-indexes.

These are all separate seams worth their own PRs. I have notes on each and am happy to send follow-ups in whatever order / shape you'd prefer — wanted to keep this one focused on the highest-bandwidth path with the smallest possible patch.

Context

Found during an independent supply-chain / threat-model review before adopting CodeGraph internally. Happy to expand on any of the residual items above or put together additional repro context on request.

Thanks for the visible focus on security hardening this spring — that's what gave me confidence to dig in rather than stay away.

… in MCP source-returning reads

Both MCP source-returning handlers re-read each matched file via
fs.readFileSync(absPath) with only a logical path-prefix check via
validatePathWithinRoot. An attacker-planted in-tree symlink (e.g.
evil.ts -> ../../.env) at a logically-in-tree path passes that check;
the read follows the link and the project-external file's contents
reach the agent. Same CWE-59 class as the colbymchenry#280 session-marker fix.

- src/context/index.ts extractNodeCode — codegraph_context source emission
- src/mcp/tools.ts handleExplore source loop — codegraph_explore emission

The repo already had isPathWithinRootReal (src/utils.ts:132) that does
the realpath comparison; this patch just calls it from both read sites
and logs a debug breadcrumb when an escape is refused.

Two regression tests added under a new describe block in
__tests__/security.test.ts, one per handler. The sentinel lives in a
function body (not a signature) so the assertion isolates the re-read
leak from any index-time metadata exposure. Tests skip on platforms
where the user can't create symlinks, matching the existing
session-marker symlink tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant