fix(rehydrate): wire agent MCP servers into restarted/recovered sessions#295
fix(rehydrate): wire agent MCP servers into restarted/recovered sessions#295ishaan-berri wants to merge 3 commits into
Conversation
Move the LiteLLM MCP-server resolution helper out of the session route into src/server/agent-mcp.ts so both the create path and the restart/recovery path can share it (and not drift). Adds agentMcpServerIds() to pull the agent's attached server IDs consistently.
rehydrateSession (the restart + message auto-recovery path) called harnessCreateSession without mcp_servers, so a restarted session silently lost its attached external MCPs (e.g. linear) — only the create path wired them. Resolve + forward mcp_servers (+ agent_id, platform_session_id) on both the K8s and brain-inline rehydrate paths, mirroring finishBringUp.
Drop the route-local copy of resolveAgentMcpServers and the duplicated mcp_servers-id extraction; import from src/server/agent-mcp instead.
Greptile SummaryThis PR fixes a gap where restarted or auto-recovered sessions lost their agent-attached MCP servers (e.g.
Confidence Score: 4/5Safe to merge — the fix is correct and the refactor is a clean extraction with no logic changes. Both rehydrate paths now correctly forward MCP servers. The only gap is that the warning string from resolveAgentMcpServers is silently dropped on rehydrate, so a LiteLLM outage at restart time leaves no log trail — a minor observability blind spot compared to the create path, but not a correctness issue. src/server/rehydrate.ts — both resolveAgentMcpServers call sites discard the warning return value.
|
| Filename | Overview |
|---|---|
| src/server/agent-mcp.ts | New shared module extracting resolveAgentMcpServers and agentMcpServerIds — clean extraction of the pre-existing implementation with no logic changes |
| src/server/rehydrate.ts | Both rehydrate paths (brain-inline and K8s) now forward mcp_servers, agent_id, and platform_session_id; the warning from resolveAgentMcpServers is silently discarded on both paths unlike in finishBringUp |
| src/app/api/v1/managed_agents/agents/[agent_id]/session/route.ts | Removes the local resolveAgentMcpServers duplicate and replaces raw ID extraction with agentMcpServerIds — straightforward call-site cleanup with no behaviour change |
Reviews (1): Last reviewed commit: "refactor(sessions): use shared resolveAg..." | Re-trigger Greptile
| const { specs: inlineMcpServers } = await resolveAgentMcpServers( | ||
| agentMcpServerIds(agent), | ||
| ); |
There was a problem hiding this comment.
The
warning returned by resolveAgentMcpServers is silently discarded on both rehydrate paths (brain-inline and K8s), unlike finishBringUp which logs it. When LiteLLM is unreachable during a restart or auto-recovery, the session will silently come up with no MCPs and there will be no trace in the logs — making this failure mode invisible in production.
| const { specs: inlineMcpServers } = await resolveAgentMcpServers( | |
| agentMcpServerIds(agent), | |
| ); | |
| const { specs: inlineMcpServers, warning: inlineMcpWarning } = await resolveAgentMcpServers( | |
| agentMcpServerIds(agent), | |
| ); | |
| if (inlineMcpWarning) console.warn(`rehydrateSession (inline) session_id=${session_id}: ${inlineMcpWarning}`); |
| const { specs: mcpServers } = await resolveAgentMcpServers( | ||
| agentMcpServerIds(agent), | ||
| ); |
There was a problem hiding this comment.
Same missing warning log on the K8s/local-dev rehydrate path. If LiteLLM is down at restart time, MCPs drop silently with no log entry.
| const { specs: mcpServers } = await resolveAgentMcpServers( | |
| agentMcpServerIds(agent), | |
| ); | |
| const { specs: mcpServers, warning: mcpWarning } = await resolveAgentMcpServers( | |
| agentMcpServerIds(agent), | |
| ); | |
| if (mcpWarning) console.warn(`rehydrateSession session_id=${session_id}: ${mcpWarning}`); |
What
When a session is restarted (or auto-recovered after its sandbox dies),
rehydrateSessionbrought the harness back up without the agent's external MCP servers — so a restarted session silently lost access to e.g.linear. Only the create path (finishBringUp) wired them (fixed earlier in #290).Why
rehydrate.tscalledharnessCreateSession({ sandbox_url, title, files })— nomcp_servers. Both the K8s path and the brain-inline path had the gap.Fix
resolveAgentMcpServersintosrc/server/agent-mcp.tsso the create path and the rehydrate path share one implementation (no drift).mcp_servers(+agent_id,platform_session_id) on both rehydrate paths, mirroringfinishBringUp.Effect
A restarted/recovered session now keeps its attached MCPs. Concretely: restarting a session on an agent configured with
linearwill listlinearagain, instead of dropping back to only the built-inlap-*servers.Test
tsc --noEmitclean. Companion to #290 (create path) — this closes the same gap on the restart path.