bug: openab as PID 1 leaves zombie processes from agent subprocesses

### Description

  When `openab` runs as PID 1 in a container (the default for the published `openab-codex` image), zombie processes from agent grandchildren are never reaped, because `openab` does not implement a `SIGCHLD` reaper expected of init processes.

  Observed in a K3s pod running `ghcr.io/openabdev/openab-codex:0.7.2` with `codex-acp` as the agent. After ~85 minutes of normal Discord usage, 44 zombie processes had accumulated — all with `PPid = 1` (i.e. `openab` itself).

  ```text
  $ cat /proc/1/comm
  openab

  $ for p in /proc/[0-9]*; do
      s=$(awk '/^State:/{print $2}' $p/status 2>/dev/null)
      pp=$(awk '/^PPid:/{print $2}' $p/status 2>/dev/null)
      [ "$s" = "Z" ] && echo "Z pid=$(basename $p) ppid=$pp"
    done | head
  Z pid=10070 ppid=1
  Z pid=10838 ppid=1
  Z pid=11431 ppid=1
  Z pid=12169 ppid=1
  Z pid=12750 ppid=1
  Z pid=13621 ppid=1
  Z pid=14232 ppid=1
  Z pid=14936 ppid=1
  Z pid=15160 ppid=1
  Z pid=1575  ppid=1

  # state distribution across all processes
        3 (kernel/empty)
       12 S
       44 Z
  ```

  The chain:

  1. `openab` (PID 1) spawns `codex-acp`
  2. `codex-acp` spawns shell tools (`git`, `grep`, `next`, ...) for tool
  calls
  3. When `codex-acp` exits or restarts a session, its remaining children get
   reparented to PID 1 (kernel default)
  4. `openab` does not call `wait()` on arbitrary children → they become
  zombies and stay forever

  This is distinct from #269 — that one is about `kiro-cli-chat` orphans (alive, ~300 MB RSS each, real memory leak). This issue is about
  reaped-but-not-cleaned zombies (almost no memory, but they consume PID table entries).

### Steps to Reproduce

  1. Deploy `openab` with the `codex-acp` agent using the standard
  `ghcr.io/openabdev/openab-codex:0.7.2` image (no init wrapper)
  2. Use the Discord bot for normal coding sessions involving shell tool
  calls (`git status`, `pnpm build`, `next build`, etc.)
  3. After ~30–60 minutes of activity, exec into the pod:
     ```bash
     kubectl exec <pod> -- sh -c '
       for p in /proc/[0-9]*; do
         awk "/^State:/{print \$2}" $p/status
       done | sort | uniq -c'
     ```
  4. Observe the count of `Z` (zombie) state processes growing roughly
  linearly with session activity, all with `PPid = 1`

### Expected Behavior

  Zombie processes should be reaped automatically. Either:

  - **Option A (recommended, image-level fix)**: ship the image with a proper
   init binary as PID 1. `tini` is the standard answer; one apt install +
  `ENTRYPOINT ["tini", "--", "openab"]` in `Dockerfile.codex`. This also
  fixes signal forwarding for free.
  - **Option B (app-level fix)**: have `openab` install a `SIGCHLD` handler
  that reaps any waitable children whenever it detects it is running as PID
  1. The Rust ecosystem has crates for this, or a small `signal-hook` loop
  calling `waitpid(-1, WNOHANG)`.

  Option A is much smaller and is the standard container best practice. It also helps every other agent image (`openab-claude`, `openab-gemini`, `openab-copilot`) since the same Dockerfile structure exists across them.

### Environment

  - chart: `openab` 0.7.2 (Helm)
  - image: `ghcr.io/openabdev/openab-codex:0.7.2`
  - agent: `codex-acp` (`@zed-industries/codex-acp@0.9.5`)
  - runtime: K3s (containerd, not docker)
  - pod securityContext: `runAsNonRoot: true`, `runAsUser: 1000`,
  `capabilities.drop: [ALL]`
  - no `shareProcessNamespace`, no init wrapper
  - `args` for codex passed via Helm:
    ```
    -c approval_policy="never"
    -c sandbox_mode="workspace-write"
    -c sandbox_workspace_write.network_access=true

### Screenshots / Logs

Process listing snippet at the time of measurement (PPid column omitted for brevity, full command lines shown for live processes only):

  ```text
  PID 1     S  openab /etc/openab/config.toml
  PID 13    S  node /usr/local/bin/codex-acp -c approval_policy="never" -c
  sandbox_mode="workspace-write" -c
  sandbox_workspace_write.network_access=true
  PID 16715 S  node /usr/local/bin/corepack pnpm build
  PID 16747 S  node /usr/local/bin/corepack pnpm -r build
  PID 16783 R  node .../packages/web/.../next build
  ... plus 44 Z (zombie) entries, all with PPid=1, empty cmdline ...
  ```

  Total state distribution: 12 `S`, 1 `R`, 44 `Z` after ~85 minutes uptime, ~30 zombies/hour during normal use. With Linux's default `pid_max=32768` this means PID exhaustion at roughly 1000 hours uptime. Long-lived pods will eventually fail to fork.

  #### Workaround

  - Periodically restart the deployment with `kubectl rollout restart
  deploy/openab`
  - Or build a downstream image that wraps the entrypoint in `tini`:

  ```dockerfile
  FROM ghcr.io/openabdev/openab-codex:0.7.2
  USER root
  RUN apt-get update && apt-get install -y --no-install-recommends tini &&
  apt-get clean
  USER node
  ENTRYPOINT ["tini", "--", "openab"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: openab as PID 1 leaves zombie processes from agent subprocesses #290

Description

Steps to Reproduce

Expected Behavior

Environment

Screenshots / Logs

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: openab as PID 1 leaves zombie processes from agent subprocesses #290

Description

Description

Steps to Reproduce

Expected Behavior

Environment

Screenshots / Logs

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions