Skip to content

fix(shell): add timeout guards for pre-spawn pipeline and post-SIGKILL cleanup#28780

Open
sergey-tihon wants to merge 2 commits into
anomalyco:devfrom
sergey-tihon:feat/cross-spawn-shell-improvements
Open

fix(shell): add timeout guards for pre-spawn pipeline and post-SIGKILL cleanup#28780
sergey-tihon wants to merge 2 commits into
anomalyco:devfrom
sergey-tihon:feat/cross-spawn-shell-improvements

Conversation

@sergey-tihon
Copy link
Copy Markdown

Issue for this PR

Closes #28654
Related: #28697, #25306, #1955

//cc @jlongster @kommander — this addresses the same class of bash tool hangs you're working on in #28654/#25664. Would appreciate your review when you get a chance.

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Two timeout guards for two distinct hang scenarios in the bash tool:

1. shell.ts (pre-spawn pipeline): The parsecollectask block before subprocess spawn has no timeout. If it hangs, the tool blocks forever — no subprocess exists, so the 2-min defaultTimeout never kicks in. Fix: wrap with Effect.timeoutOrElse({ duration: "10 seconds", orElse: () => Effect.void }). On timeout, permission scanning is skipped and execution proceeds to run(). On real errors (parse failure, permission denied), they propagate normally.

We observed this in a CI environment (headless, opencode run) where the bash tool hung for 20+ minutes on heredoc commands. The failing commands were:

cat > /tmp/finalize_sections.sh << 'EOF'
#!/bin/bash
set -euo pipefail
# ... ~80 lines of script content ...
EOF
chmod +x /tmp/finalize_sections.sh && /tmp/finalize_sections.sh

and:

cat > /tmp/finalize_all.sh << 'EOF'
#!/bin/bash
set -euo pipefail
# ... ~60 lines of script content ...
EOF
chmod +x /tmp/finalize_all.sh && /tmp/finalize_all.sh

Both hung identically. The DB state for these tool calls shows:

{
  "state": {
    "status": "running",
    "input": { "command": "cat > /tmp/finalize_sections.sh << 'EOF'\n..." },
    "time": { "start": 1746621747000 }
  }
}

Note the missing metadata field. In the normal code path, ctx.metadata({ metadata: { output: "" } }) is called at shell.ts:472 — BEFORE spawner.spawn() at line 482. The absence of metadata in the DB proves execution never reached run(). The hang is somewhere in the pre-spawn pipeline: parse() (tree-sitter WASM), collect() (AST walk + path resolution), or ask() (permission scan).

On Linux, collect() only does path.resolve() (pure string ops) and fs.isDir() (stat). ask() is auto-approved in headless mode. That leaves parse() — the tree-sitter WASM .parse(command) call — as the most likely candidate, though we cannot definitively rule out an Effect runtime scheduling issue.

The same session successfully parsed ~50 other bash commands before these two hung, so it's not a cold-start WASM loading issue. Something about long heredoc input + extended session runtime triggers it.

2. cross-spawn-spawner.ts (post-SIGKILL await): After sending SIGKILL, Deferred.await(signal) waits for the close event. When child processes inherit stdout/stderr and outlive the killed parent (gradle daemons, nohup'd servers, Chrome subprocesses), the pipes stay open and close never fires. The tool hangs forever despite the target process being dead. Fix: Deferred.await(signal).pipe(Effect.timeout("5 seconds"), Effect.ignore). After SIGKILL, the kernel guarantees the process is dead — if close hasn't fired in 5s it's a pipe-inheritance issue, not a process-alive issue.

Why these work: the pre-spawn pipeline normally completes in <500ms (10s is 20x margin). After SIGKILL, close normally fires in <100ms (5s is 50x margin). Both are last-resort guards that only activate in degenerate cases.

How did you verify your code works?

  • Forensic analysis of the opencode.db from the hung CI run — confirmed 4 tool calls stuck in "running" state with no metadata field, proving the pre-spawn pipeline is the hang location
  • Read the code paths to confirm the timeouts wrap the correct Effect blocks and don't alter behavior in the normal case
  • Verified Effect.timeoutOrElse only handles timeout (real errors still propagate) — matches existing usage in cross-spawn-spawner.ts lines 337, 394, 435
  • Verified Effect.timeout + Effect.ignore after SIGKILL matches the pattern already used (line 403 already Effect.ignores the entire escalated kill)
  • Cannot write a deterministic unit test for either hang — the pre-spawn hang requires WASM state corruption or runtime scheduling failure under extended sessions; the close-event hang requires orphaned child processes holding pipes open. Both are environment/timing dependent.

DB forensic evidence:

SELECT json_extract(p.data, '$.state')
FROM part p
WHERE json_extract(p.data, '$.state.status') = 'running'
  AND json_extract(p.data, '$.type') = 'tool';
-- Returns 4 rows, all missing $.state.metadata

Screenshots / recordings

N/A — no UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Copilot AI review requested due to automatic review settings May 22, 2026 06:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds defensive timeouts around potentially hanging Effects in shell scanning and child-process termination flows to prevent indefinite waits.

Changes:

  • Add a 10-second timeout fallback around a shell scanning/asking Effect.
  • Add a bounded (5-second) wait when escalating to SIGKILL so termination doesn’t hang indefinitely.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/opencode/src/tool/shell.ts Wrapes an Effect chain with timeoutOrElse to avoid hanging during scan/ask.
packages/core/src/cross-spawn-spawner.ts Adds a timeout/ignore around waiting for the process “signal” Deferred after sending SIGKILL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/opencode/src/tool/shell.ts Outdated
Comment thread packages/core/src/cross-spawn-spawner.ts
Comment thread packages/core/src/cross-spawn-spawner.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bash tool: timeout-killed processes not detected as completed; user must manually abort

2 participants