|
| 1 | +# Agent Process Lifecycle Bug |
| 2 | + |
| 3 | +## Problem Summary |
| 4 | + |
| 5 | +Multiple interrelated bugs causing: |
| 6 | +1. **Duplicate agents** - Multiple agent processes run for same task, causing interleaved responses |
| 7 | +2. **App hangs on shutdown** - Cleanup waits indefinitely for unresponsive agent processes |
| 8 | +3. **Orphaned processes** - Agent subprocesses not properly terminated |
| 9 | + |
| 10 | +## Root Causes |
| 11 | + |
| 12 | +### Race Condition #1: Renderer (sessionStore.ts:955-1012) |
| 13 | +```typescript |
| 14 | +if (connectAttempts.has(taskId)) return; // Line 956 - check |
| 15 | +// ... async work ... |
| 16 | +connectAttempts.add(taskId); // Line 1012 - add (too late!) |
| 17 | +``` |
| 18 | +Two rapid calls both pass the check before either adds to the set. |
| 19 | + |
| 20 | +### Race Condition #2: Main Process (service.ts:388-497) |
| 21 | +```typescript |
| 22 | +const existing = this.sessions.get(taskRunId); // Line 389 - check |
| 23 | +if (existing) return existing; |
| 24 | +// ... 100+ lines of async work ... |
| 25 | +this.sessions.set(taskRunId, session); // Line 497 - set (too late!) |
| 26 | +``` |
| 27 | +Also: Sessions keyed by `taskRunId`, not `taskId` - two runs for same task both create agents. |
| 28 | + |
| 29 | +### No Cleanup Timeout (app-lifecycle/service.ts:22-40) |
| 30 | +```typescript |
| 31 | +await container.unbindAll(); // Line 26 - can hang forever |
| 32 | +``` |
| 33 | +If agent subprocess doesn't respond, cleanup never completes, app never quits. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Key Files |
| 38 | + |
| 39 | +| File | Role | |
| 40 | +|------|------| |
| 41 | +| `apps/twig/src/main/services/agent/service.ts` | Main process agent management - sessions Map, getOrCreateSession, cleanupSession | |
| 42 | +| `apps/twig/src/renderer/features/sessions/stores/sessionStore.ts` | Renderer session management - connectAttempts Set, connectToTask | |
| 43 | +| `apps/twig/src/main/services/app-lifecycle/service.ts` | App shutdown - calls container.unbindAll() | |
| 44 | +| `packages/agent/src/agent.ts` | Agent wrapper - cleanup() method | |
| 45 | +| `packages/agent/src/adapters/acp-connection.ts` | ACP connection - actual cleanup logic | |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Fix Plan |
| 50 | + |
| 51 | +### Phase 1: Main Process Mutex + Kill-Before-Create |
| 52 | +- Add `sessionsByTaskId` Map to track by taskId (not just taskRunId) |
| 53 | +- Add `pendingCreations` Map to prevent race conditions |
| 54 | +- Before creating new session, clean up any existing session for same taskId |
| 55 | + |
| 56 | +### Phase 2: Cleanup Timeout |
| 57 | +- Race `agent.cleanup()` against 5-second timeout |
| 58 | +- Call `forceCleanup()` if timeout |
| 59 | + |
| 60 | +### Phase 3: Force Cleanup Method |
| 61 | +- Add `forceCleanup()` to Agent class that aborts the session controller |
| 62 | + |
| 63 | +### Phase 4: App Shutdown Timeout |
| 64 | +- Add 10-second overall timeout to `shutdown()` |
| 65 | + |
| 66 | +### Phase 5: Renderer Mutex |
| 67 | +- Replace `connectAttempts` Set with Promise-based locking |
| 68 | +- Check "connecting" status, not just "connected" |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## Debug Log Location |
| 73 | + |
| 74 | +When the app hangs on shutdown, check: |
| 75 | +``` |
| 76 | +~/Library/Logs/twig/main.log |
| 77 | +``` |
| 78 | + |
| 79 | +Or tail it live: |
| 80 | +```bash |
| 81 | +tail -f ~/Library/Logs/twig/main.log | grep -E "(AGENT_DEBUG|cleanupSession|getOrCreateSession|shutdown)" |
| 82 | +``` |
0 commit comments