fix: fall back to assembled assistant text when no stream deltas arrive#661
Open
LeoCui wants to merge 1 commit into
Open
Conversation
The `assistant` case in the prompt loop unconditionally filtered out `text` and `thinking` content blocks on the assumption that those had already been pushed to the client via `stream_event` → `content_block_delta` notifications. That assumption holds when the underlying SDK / model proxy emits Anthropic-style streaming deltas, but it breaks when the model is fronted by a proxy that returns the second model call (the one that consumes a tool_result) as a single non- streamed assistant message. In that case `content_block_delta` for the final text is never produced, the assembled `text` block is filtered out, and the ACP client receives `stopReason: "end_turn"` with no `agent_message_chunk` notifications — so the user sees the tool's progress but an empty final answer. Track which assistant message ids have actually produced streamed text/thinking deltas (`textStreamedMessageIds` on `Session`, populated from `message_start` + `content_block_delta` in the `stream_event` case). In the `assistant` case, keep the existing filter when the id is in that set and fall back to forwarding the assembled `text` / `thinking` blocks as `agent_message_chunk` / `agent_thought_chunk` when it isn't. The fallback emits exactly once per assistant message id, mirroring the dedupe pattern already used by `toolUseCache` for tool_use blocks. Tests cover both branches: a session whose assistant message arrives with no preceding `content_block_delta` now produces exactly one text chunk, and a session that did stream the same text via deltas does not re-emit it. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
When the underlying Claude Agent SDK talks to an Anthropic-protocol gateway that does not preserve
content_block_deltaevents for some responses (common with OpenAI-compatible gateways translating to the Anthropic protocol — e.g. internal model gateways, LiteLLM-based proxies, custom Bedrock/Vertex shims), the ACP adapter silently drops the final assistant text:stop_reason: \"end_turn\"\.```ts
const content =
message.type === "assistant"
? // Handled by stream events above
message.message.content.filter(
(item) => !["text", "thinking"].includes(item.type),
)
: message.message.content;
```
The client UI is left displaying tool progress with an empty final answer, while the underlying Claude Code session log shows the text content was produced normally.
Repro
Fix
Track which assistant message ids have actually produced streamed text/thinking deltas (`textStreamedMessageIds` on `Session`, populated from `message_start` + `content_block_delta` in the `stream_event` case). In the `assistant` case, keep the existing filter when the id is in that set, and fall back to forwarding the assembled `text` / `thinking` blocks as `agent_message_chunk` / `agent_thought_chunk` when it isn't.
Same dedupe pattern as `toolUseCache` already uses for `tool_use` blocks (first encounter via stream → `tool_call`, second encounter via assembled message → `tool_call_update`).
A single `logger.log` line is emitted on the fallback path so operators have a hard signal when this fires.
Production evidence
Deployed to a Lark/Feishu bot that uses an internal OpenAI-style gateway (gpt-5.4 model fronted via Anthropic protocol). The fallback fires regularly in real traffic — log excerpt from the first few minutes after deploy:
```
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_049906868815d358006a05a8f2b7e4819398e68b896940fd58; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_0be8b8386e3f41a9006a05a8fd865481949773a185cd1676ef; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_0be8b8386e3f41a9006a05a8fd865481949773a185cd1676ef; emitting assembled blocks as fallback.
[claude-agent-acp] No streamed text/thinking deltas seen for assistant message resp_068c4787db729814006a05a95539dc81948d4291821926b50a; emitting assembled blocks as fallback.
```
Each of those would have been an empty final answer before this fix. After the fix all four sessions delivered the assistant's text correctly to the ACP client.
Test plan
Notes / open questions
Made with Cursor