Description
When running openab with kiro-cli on a constrained host (e.g. 3.6 GB RAM on Zeabur), the session pool fills up with idle sessions that are never reclaimed in time. Once max_sessions is reached, new requests are rejected and the host eventually OOMs.
Each kiro-cli acp spawns a child kiro-cli-chat acp process (~230-390 MB each). When the pool drops a session, kill_on_drop only kills the direct child — the grandchild kiro-cli-chat process becomes orphaned and keeps consuming memory.
Observed on a live deployment — 10 stale kiro-cli-chat acp processes consuming 3 GB total:
| PID |
Started |
RSS |
| 459820 |
Apr12 |
290 MB |
| 581161 |
Apr12 |
306 MB |
| 625730 |
Apr12 |
300 MB |
| 633382 |
Apr12 |
282 MB |
| 673360 |
Apr12 |
388 MB |
| 724688 |
00:43 |
273 MB |
| 872305 |
08:48 |
274 MB |
| 872764 |
08:50 |
236 MB |
| 907784 |
10:39 |
227 MB |
| 913618 |
11:00 |
230 MB |
Four root causes identified:
-
Orphaned grandchild processes — kill_on_drop(true) only SIGKILLs the direct child PID. The grandchild kiro-cli-chat survives and leaks memory. Fix: use process groups (setsid/setpgid) and kill the entire group on cleanup.
-
No cleanup on Discord thread archive — EventHandler only implements message and ready. Archiving a thread leaves the session alive until TTL. Fix: implement thread_update handler.
-
No LRU eviction — when pool is full, get_or_create() rejects with "pool exhausted" instead of evicting the oldest idle session. Fix: evict oldest last_active session when at capacity.
-
Default TTL too long — session_ttl_hours defaults to 24. On a 3.6 GB host with 10 sessions × ~300 MB = 3 GB of idle processes. Fix: lower default or document memory implications.
Industry Comparison
A survey of agent harnesses from Picrew/awesome-agent-harness shows openab sits in the riskiest position — process-level isolation without proper process group management:
| Harness |
Isolation |
Orphan Risk |
Cleanup Strategy |
| Gemini CLI a2a |
In-process Map |
None ✅ |
task.dispose() + Map delete |
| openab |
Process |
HIGH ☠️ |
kill_on_drop (broken for grandchildren) |
| acpx |
Process |
Low ✅ |
3-stage shutdown (stdin.end() → SIGTERM → SIGKILL) + self-terminating TTL |
| Scion |
Container |
None ✅ |
docker rm -f kills everything |
| Daytona / E2B |
VM/microVM |
None ✅ |
Destroy sandbox API |
Key insight from acpx: they use a 3-stage graceful shutdown (stdin.end() → SIGTERM 1.5s → SIGKILL 1s → detach all handles) and self-terminating queue-owner processes that exit when idle. This eliminates both the orphan problem and the need for a central cleanup task.
Key insight from Scion: container-per-agent makes orphans impossible by design (docker rm -f kills the entire process tree). This is the most robust long-term architecture but requires more infrastructure.
Steps to Reproduce
- Deploy openab with
kiro-cli on a host with limited RAM (e.g. 3.6 GB)
- Send messages from Discord that create multiple threads (up to
max_sessions)
- Archive/close the Discord threads
- Observe that
kiro-cli and kiro-cli-chat processes remain running
- Run
ps aux | grep kiro-cli — orphaned processes accumulate
- Eventually the host runs out of memory and the pod/container is killed
Expected Behavior
- When a Discord thread is archived, the associated session and all its child processes should be terminated
- When the pool is full, the oldest idle session should be evicted to make room
- When a session is dropped, all descendant processes (including grandchildren) should be killed via process group signal
- Default TTL should be reasonable for small hosts, or clearly documented
Description
When running openab with
kiro-clion a constrained host (e.g. 3.6 GB RAM on Zeabur), the session pool fills up with idle sessions that are never reclaimed in time. Oncemax_sessionsis reached, new requests are rejected and the host eventually OOMs.Each
kiro-cli acpspawns a childkiro-cli-chat acpprocess (~230-390 MB each). When the pool drops a session,kill_on_droponly kills the direct child — the grandchildkiro-cli-chatprocess becomes orphaned and keeps consuming memory.Observed on a live deployment — 10 stale
kiro-cli-chat acpprocesses consuming 3 GB total:Four root causes identified:
Orphaned grandchild processes —
kill_on_drop(true)only SIGKILLs the direct child PID. The grandchildkiro-cli-chatsurvives and leaks memory. Fix: use process groups (setsid/setpgid) and kill the entire group on cleanup.No cleanup on Discord thread archive —
EventHandleronly implementsmessageandready. Archiving a thread leaves the session alive until TTL. Fix: implementthread_updatehandler.No LRU eviction — when pool is full,
get_or_create()rejects with "pool exhausted" instead of evicting the oldest idle session. Fix: evict oldestlast_activesession when at capacity.Default TTL too long —
session_ttl_hoursdefaults to 24. On a 3.6 GB host with 10 sessions × ~300 MB = 3 GB of idle processes. Fix: lower default or document memory implications.Industry Comparison
A survey of agent harnesses from Picrew/awesome-agent-harness shows openab sits in the riskiest position — process-level isolation without proper process group management:
Maptask.dispose()+ Map deletekill_on_drop(broken for grandchildren)stdin.end()→ SIGTERM → SIGKILL) + self-terminating TTLdocker rm -fkills everythingKey insight from acpx: they use a 3-stage graceful shutdown (
stdin.end()→ SIGTERM 1.5s → SIGKILL 1s → detach all handles) and self-terminating queue-owner processes that exit when idle. This eliminates both the orphan problem and the need for a central cleanup task.Key insight from Scion: container-per-agent makes orphans impossible by design (
docker rm -fkills the entire process tree). This is the most robust long-term architecture but requires more infrastructure.Steps to Reproduce
kiro-clion a host with limited RAM (e.g. 3.6 GB)max_sessions)kiro-cliandkiro-cli-chatprocesses remain runningps aux | grep kiro-cli— orphaned processes accumulateExpected Behavior