Skip to content

Orphaned codex process at 100% CPU after Claude Code session ended (PPID=1, main thread spinning in read()) #193

@qiuruiyu

Description

@qiuruiyu

Summary

Found a stuck codex Rust binary at 98.8% CPU, orphaned (PPID=1), running for 5+ hours after the originating Claude Code session had already exited. This is in the same family as #108 / #163 / #164 (orphaned-process / no-cleanup-on-session-exit), but the leaked process here is the codex CLI binary itself, not app-server-broker.mjs, and crucially it is busy-looping at 100% CPU rather than sitting idle.

Observed process

$ ps aux | grep codex
joseph  94778  98.8  0.0 ... R   3:57PM  38:36.79 .../codex-darwin-arm64/.../codex -s read-only -a untrusted
joseph  94767   0.0  0.1 ... S   3:57PM   0:00.03 node /opt/homebrew/bin/codex -s read-only -a untrusted

$ ps -o pid,ppid,command -p 94767
  PID  PPID COMMAND
94767     1 node /opt/homebrew/bin/codex -s read-only -a untrusted

PPID=1 (launchd) confirms the original parent died and the process was reparented to init. Started at 3:57 PM, alive at 9:30 PM = ~5.5 h of runtime, accumulated ~39 minutes of CPU time.

Why it burns CPU (not just idle)

sample shows the main thread spinning in a kernel read() syscall:

Call graph:
    1417 Thread_xxx   DispatchQueue_1: com.apple.main-thread  (serial)
    + 1417 start  (in dyld) + 6076
    +   1417 ??? (in codex)
    +     ...
    +                               1415 ???  (in codex)
    +                               ! 1415 read  (in libsystem_kernel.dylib) + 8
    1417 Thread_xxx: tokio-runtime-worker
    +   ...
    +                   1417 _pthread_cond_wait  (in libsystem_pthread.dylib)
    +                     1417 __psynch_cvwait  (in libsystem_kernel.dylib)
    (multiple tokio worker threads, all blocked in __psynch_cvwait)

1415 / 1417 samples (~99%) on the main thread are inside read(). All tokio worker threads are correctly parked in pthread_cond_wait. Combined with PPID=1, the most likely interpretation:

When the originating parent process died, the stdin pipe was closed. read() returns EOF (0). The main loop does not treat EOF as a terminal condition and instead retries the read in a tight loop, producing 100% CPU on a single thread.

This is consistent with the bug class in #191 (fs.readFileSync(0, "utf8") blocking on Windows when stdin handling is wrong) — same root cause shape, different language and a different process. In the Rust binary's case the symptom is busy-loop instead of indefinite block.

Relation to existing issues

I could not locate the exact plugin code that spawns codex with the literal CLI flags -s read-only -a untrusted. The plugin's lib/app-server.mjs spawns codex with ["app-server"] and passes sandbox / approvalPolicy over the JSON-RPC thread/start params, not as CLI flags. So I'm not 100% certain this orphan was spawned by codex-plugin-cc — it may have been spawned via a different path (e.g., a codex CLI subprocess inside the Rust binary itself, or another tool entirely). I'm filing here because:

  1. The plugin is the only thing on this machine that drives Codex automated workflows.
  2. The orphan / no-cleanup pattern matches an active issue cluster on this repo.
  3. If this is in fact a codex-cli core bug, the maintainers here are best positioned to redirect.

Happy to provide the full sample output, full ps tree, or further reproduction info if it would help.

Environment

  • macOS 15.5 (24F74), Apple Silicon
  • Node.js v24.6.0
  • codex-cli 0.117.0
  • codex-plugin-cc v1.0.2 (commit 8e403f9d)
  • Claude Code (latest, with codex plugin enabled via ~/.claude/settings.json enabledPlugins)

Suggested directions

  1. Defense in depth: idle / EOF timeout in codex CLI's stdin reader. If read() on stdin returns EOF, the binary should exit cleanly rather than retry. (Mirrors the idle timeout suggestion in Broker process not cleaned up on session exit — no idle timeout #108.)
  2. Plugin-side parent-death detection. When the plugin spawns codex (in any mode), it could pass an env var or use prctl(PR_SET_PDEATHSIG) (Linux) / a periodic parent-PID check (macOS) so the child self-terminates if its parent dies.
  3. Audit for any CLI-mode codex invocation in the plugin with -s read-only -a untrusted. If none exists, this report should be rerouted to openai/codex (the CLI repo).

Reproducer: I do not have a deterministic reproducer. The orphan was found in Activity Monitor after a normal day of using Claude Code with the plugin enabled. Killing the parent Claude Code process (or letting it crash) is probably the trigger; I will try to reproduce explicitly and follow up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions