Skip to content

fix: fall back to assembled assistant text when no stream deltas arrive#661

Open
LeoCui wants to merge 1 commit into
agentclientprotocol:mainfrom
LeoCui:upstream-pr/fallback-text-from-assistant-message
Open

fix: fall back to assembled assistant text when no stream deltas arrive#661
LeoCui wants to merge 1 commit into
agentclientprotocol:mainfrom
LeoCui:upstream-pr/fallback-text-from-assistant-message

Conversation

@LeoCui
Copy link
Copy Markdown

@LeoCui LeoCui commented May 14, 2026

Background

When the underlying Claude Agent SDK talks to an Anthropic-protocol gateway that does not preserve content_block_delta events for some responses (common with OpenAI-compatible gateways translating to the Anthropic protocol — e.g. internal model gateways, LiteLLM-based proxies, custom Bedrock/Vertex shims), the ACP adapter silently drops the final assistant text:

  1. The model produces a normal turn ending with stop_reason: \"end_turn\"\.
  2. The SDK forwards a fully assembled `assistant` message containing a `text` block.
  3. `prompt()` in `acp-agent.ts` filters that `text` block out, with the comment "Handled by stream events above":

```ts
const content =
message.type === "assistant"
? // Handled by stream events above
message.message.content.filter(
(item) => !["text", "thinking"].includes(item.type),
)
: message.message.content;
```

  1. But because the gateway never emitted `content_block_delta` for the text, the `stream_event` path never produced an `agent_message_chunk` either. The ACP client receives only `tool_call` updates and `result.stopReason: "end_turn"` — no `agent_message_chunk` ever arrives.

The client UI is left displaying tool progress with an empty final answer, while the underlying Claude Code session log shows the text content was produced normally.

Repro

  • Drive the SDK against a gateway that returns one or more model calls as a single non-streamed block (most commonly the follow-up call that consumes a tool_result).
  • Ask the agent something that triggers `tool_use → tool_result → final text`.
  • Observe in the ACP client: tool_call progress arrives, the prompt RPC returns `stopReason: "end_turn"`, and the final text never shows up.

Fix

Track which assistant message ids have actually produced streamed text/thinking deltas (`textStreamedMessageIds` on `Session`, populated from `message_start` + `content_block_delta` in the `stream_event` case). In the `assistant` case, keep the existing filter when the id is in that set, and fall back to forwarding the assembled `text` / `thinking` blocks as `agent_message_chunk` / `agent_thought_chunk` when it isn't.

Same dedupe pattern as `toolUseCache` already uses for `tool_use` blocks (first encounter via stream → `tool_call`, second encounter via assembled message → `tool_call_update`).

A single `logger.log` line is emitted on the fallback path so operators have a hard signal when this fires.

Production evidence

Deployed to a Lark/Feishu bot that uses an internal OpenAI-style gateway (gpt-5.4 model fronted via Anthropic protocol). The fallback fires regularly in real traffic — log excerpt from the first few minutes after deploy:

```
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_049906868815d358006a05a8f2b7e4819398e68b896940fd58; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_0be8b8386e3f41a9006a05a8fd865481949773a185cd1676ef; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_0be8b8386e3f41a9006a05a8fd865481949773a185cd1676ef; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_068c4787db729814006a05a95539dc81948d4291821926b50a; emitting assembled blocks as fallback.
```

Each of those would have been an empty final answer before this fix. After the fix all four sessions delivered the assistant's text correctly to the ACP client.

Test plan

  • `npm run test:run` — 257 passed, 13 skipped (255 existing + 2 new)
  • `npm run check` (lint + prettier) — clean
  • `npx tsc --noEmit` — clean
  • Two new tests in `src/tests/acp-agent.test.ts` covering both branches:
    • emits the assembled text when no `content_block_delta` was streamed
    • does not re-emit text already streamed via `content_block_delta`
  • Verified in production against an OpenAI-style gateway (see logs above)

Notes / open questions

  • The duplicate fallback log for `resp_0be8b8...` in the production output is benign for that case (Lark side accumulator is idempotent against the same assembled text arriving twice). It would be reasonable to also record the message id into `textStreamedMessageIds` after the fallback emits, to make the fallback path itself idempotent against duplicate `assistant` deliveries. Happy to add that in a follow-up commit if you prefer; left out of this change to keep the diff focused on the original bug.
  • A second appealing direction would be to switch `toolUseCache`'s implementation to a `Set` once we no longer need the input snapshot — currently the only thing keeping it as a map is the secondary `tool_call_update` payload. Out of scope here.

Made with Cursor

The `assistant` case in the prompt loop unconditionally filtered out
`text` and `thinking` content blocks on the assumption that those had
already been pushed to the client via `stream_event` →
`content_block_delta` notifications. That assumption holds when the
underlying SDK / model proxy emits Anthropic-style streaming deltas, but
it breaks when the model is fronted by a proxy that returns the second
model call (the one that consumes a tool_result) as a single non-
streamed assistant message. In that case `content_block_delta` for the
final text is never produced, the assembled `text` block is filtered
out, and the ACP client receives `stopReason: "end_turn"` with no
`agent_message_chunk` notifications — so the user sees the tool's
progress but an empty final answer.

Track which assistant message ids have actually produced streamed
text/thinking deltas (`textStreamedMessageIds` on `Session`, populated
from `message_start` + `content_block_delta` in the `stream_event`
case). In the `assistant` case, keep the existing filter when the id is
in that set and fall back to forwarding the assembled `text` / `thinking`
blocks as `agent_message_chunk` / `agent_thought_chunk` when it isn't.
The fallback emits exactly once per assistant message id, mirroring the
dedupe pattern already used by `toolUseCache` for tool_use blocks.

Tests cover both branches: a session whose assistant message arrives
with no preceding `content_block_delta` now produces exactly one text
chunk, and a session that did stream the same text via deltas does not
re-emit it.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant