Summary
Add heartbeat emission and liveness monitoring for long-running agent invocations, enabling watchdog UIs, timeout warnings, and progress reporting for agents that run for extended periods.
Motivation
Agent invocations can take anywhere from 30 seconds to 30+ minutes depending on task complexity. The framework's AgentLauncher.launchAgent() is a single async call that returns an AgentResult when done — there's no visibility into what's happening during execution. The timedOut: boolean field on AgentResult only tells you after the fact.
AAMF emits periodic agent-heartbeat events with elapsed time and agent-timed-out events when limits are approached. These drive:
- CLI progress indicators ("code-migrator running for 4m32s...")
- Timeout warnings before hard-kills ("approaching 10min limit, 8m45s elapsed")
- Monitoring dashboards for fleet-scale runs
Proposed API
Heartbeat Configuration
interface HeartbeatConfig {
/** Interval between heartbeat emissions in milliseconds (default: 30_000). */
intervalMs?: number;
/** Emit a warning event when this percentage of the timeout has elapsed (default: 0.8 = 80%). */
timeoutWarningThreshold?: number;
}
interface AgentLauncherConfig {
// ... existing fields ...
heartbeat?: HeartbeatConfig;
}
Events
interface AgentHeartbeatEvent {
type: 'agent-heartbeat';
agent: string;
issueNumber?: number;
taskId?: string;
sessionId?: string;
/** Seconds elapsed since invocation started. */
elapsedSeconds: number;
/** Configured timeout in seconds (if set). */
timeoutSeconds?: number;
/** Percentage of timeout consumed (0-1). */
timeoutPercentage?: number;
}
interface AgentTimeoutWarningEvent {
type: 'agent-timeout-warning';
agent: string;
issueNumber?: number;
taskId?: string;
elapsedSeconds: number;
timeoutSeconds: number;
/** How many seconds remain before the hard timeout. */
remainingSeconds: number;
}
interface AgentTimedOutEvent {
type: 'agent-timed-out';
agent: string;
issueNumber?: number;
taskId?: string;
timeoutSeconds: number;
}
Add all three to the FrameworkLifecycleEvent union.
AgentLauncher Integration
The launcher starts a heartbeat interval timer when invoke() begins and clears it when the invocation completes. During the interval:
- Emit
AgentHeartbeatEvent with current elapsed time
- If elapsed >
timeout * timeoutWarningThreshold, emit AgentTimeoutWarningEvent (once)
- When the process is killed due to timeout, emit
AgentTimedOutEvent
Callback Hook (for non-event-bus consumers)
interface AgentInvocation {
// ... existing fields ...
/** Optional callback invoked periodically during execution with elapsed time. */
onHeartbeat?: (elapsed: { seconds: number; timeoutPercentage?: number }) => void;
}
Implementation Notes
- The heartbeat timer runs in the launcher's process, not inside the agent subprocess — no agent cooperation required
setInterval with clearInterval on completion; use unref() so the timer doesn't prevent process exit
- Heartbeat events go through the standard
FleetEventBus dispatch, which means notification providers (Slack, webhook) can react to long-running agents
- The
onHeartbeat callback is a lightweight alternative for consumers who don't use the event bus
- Default interval of 30s keeps event volume low while providing useful liveness signal
- The timeout warning should fire exactly once per invocation (not on every heartbeat after crossing the threshold)
- Consider adding a
lastHeartbeat timestamp to AgentResult for diagnostic purposes
Summary
Add heartbeat emission and liveness monitoring for long-running agent invocations, enabling watchdog UIs, timeout warnings, and progress reporting for agents that run for extended periods.
Motivation
Agent invocations can take anywhere from 30 seconds to 30+ minutes depending on task complexity. The framework's
AgentLauncher.launchAgent()is a single async call that returns anAgentResultwhen done — there's no visibility into what's happening during execution. ThetimedOut: booleanfield onAgentResultonly tells you after the fact.AAMF emits periodic
agent-heartbeatevents with elapsed time andagent-timed-outevents when limits are approached. These drive:Proposed API
Heartbeat Configuration
Events
Add all three to the
FrameworkLifecycleEventunion.AgentLauncher Integration
The launcher starts a heartbeat interval timer when
invoke()begins and clears it when the invocation completes. During the interval:AgentHeartbeatEventwith current elapsed timetimeout * timeoutWarningThreshold, emitAgentTimeoutWarningEvent(once)AgentTimedOutEventCallback Hook (for non-event-bus consumers)
Implementation Notes
setIntervalwithclearIntervalon completion; useunref()so the timer doesn't prevent process exitFleetEventBusdispatch, which means notification providers (Slack, webhook) can react to long-running agentsonHeartbeatcallback is a lightweight alternative for consumers who don't use the event buslastHeartbeattimestamp toAgentResultfor diagnostic purposes