Skip to content

CopilotBackend: token usage always reports 0 — switch from -s to --output-format json and parse JSONL events #440

@jafreck

Description

@jafreck

Problem

The CopilotBackend launches the Copilot CLI with -s (silent mode), which outputs plain text only. The parseTokenUsage() function then tries to extract token counts from that plain text using regex patterns, but the Copilot CLI's plain-text output doesn't include a recognizable token usage summary. Result: tokenUsage is always 0.

Evidence from a real 310-task migration (zstd C → Rust, via AAMF)

  • 579 agent invocations, all returning tokenUsage: 0
  • The Copilot CLI does report token data when using --output-format json:
    • Every assistant.message event includes data.outputTokens (integer)
    • The result event includes usage.premiumRequests, usage.totalApiDurationMs, usage.sessionDurationMs
    • 10,284 assistant.message events with outputTokens summing to 12.4M output tokens — all ignored because -s mode doesn't emit JSONL
  • The result event does not include input/output/inputTokens fields — only premiumRequests, totalApiDurationMs, sessionDurationMs, codeChanges

Required Changes

1. Switch CopilotBackend from -s to --output-format json

In CopilotBackend.invoke(), replace:

'-s',

with:

'--output-format', 'json',

2. Parse JSONL output for token usage and text content

The JSONL stream contains these event types:

// assistant.message — contains the agent's text output AND outputTokens
{"type": "assistant.message", "data": {"messageId": "...", "content": "...", "outputTokens": 1206}}

// assistant.message_delta — streaming deltas (ignore for token counting)
{"type": "assistant.message_delta", "data": {"deltaContent": "..."}}

// assistant.tool_call — tool invocations
{"type": "assistant.tool_call", "data": {"toolName": "read_file"}}

// result — final summary (NO token counts, only premiumRequests)
{"type": "result", "exitCode": 0, "usage": {"premiumRequests": 1, "totalApiDurationMs": 751446, "sessionDurationMs": 881981}}

Add a JSONL parser that:

  1. Iterates lines of stdout, parses each as JSON
  2. For assistant.message events: accumulates data.outputTokens and concatenates data.content to reconstruct the text output
  3. For result events: extracts premiumRequests from usage
  4. Non-JSON lines get appended to the text content as-is (for backwards compat)

3. Update AgentResult.tokenUsage

Populate tokenUsage from the accumulated outputTokens:

tokenUsage: accumulatedOutputTokens > 0
  ? { input: 0, output: accumulatedOutputTokens, model: model ?? 'unknown' }
  : 0

Note: The Copilot CLI does not report input tokens per-message. Setting input: 0 is accurate to the data available. Consumers that need total token estimates can use premiumRequests as a proxy.

4. Expose premiumRequests on AgentResult

Add an optional field to AgentResult:

/** Estimated premium requests consumed (Copilot CLI only). */
premiumRequests?: number;

This is the Copilot CLI's native cost metric and should be surfaced for budget tracking.

5. Keep parseTokenUsage() as fallback

The existing parseTokenUsage() regex-based parser should remain as a fallback for when JSONL parsing finds nothing (e.g., older CLI versions that don't emit outputTokens).

6. Reconstruct stdout for consumers

Since -s mode is gone, AgentResult.stdout now contains JSONL instead of plain text. The framework should reconstruct the text content from assistant.message events and set that as stdout, or add a separate field. Downstream consumers (like AAMF's parseAamfOutput) parse stdout for structured agent output blocks — they need the text content, not raw JSONL.

Option A (recommended): Set stdout to the reconstructed text content (concatenated data.content from assistant.message events). Store raw JSONL in a new optional field if needed.

Option B: Keep stdout as raw JSONL and let consumers deal with it. (This is what AAMF already does with its own AamfCopilotBackend that overrides the framework's backend, but it means every framework consumer has to implement JSONL parsing.)

Context

AAMF already works around this by registering a custom AamfCopilotBackend that uses --output-format json and parses the JSONL stream (see src/core/agent-launcher.ts). This fix would bring the framework's built-in CopilotBackend up to parity, eliminating the need for downstream workarounds.

Non-goals

  • Do not attempt to estimate input tokens from output tokens or API duration — that's the consumer's responsibility
  • The Claude backend is unaffected; it already parses JSON usage correctly

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions