Skip to content

fix: improve GLM 4.7 responses handling for Codex#7666

Closed
ishandhanani wants to merge 9 commits intomainfrom
idhanani/dyn-glm47-responses-codex
Closed

fix: improve GLM 4.7 responses handling for Codex#7666
ishandhanani wants to merge 9 commits intomainfrom
idhanani/dyn-glm47-responses-codex

Conversation

@ishandhanani
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani commented Mar 27, 2026

Summary

  • add the GLM 4.7 parser and Responses API fixes needed for Codex on /v1/responses
  • keep this PR scoped to GLM/Responses behavior only

Issues Found

  • GLM malformed or nested <tool_call> blocks could confuse parsing and cause Codex tool calls to be dropped or recovered inconsistently
  • Responses conversion could leak GLM reasoning / markup into visible assistant text instead of keeping it hidden or structured
  • Mixed tool-call streaming could emit the same tool call twice when a model produced both raw <tool_call> text and structured tool_calls deltas
  • GLM chat templating could break on tool arguments that contain an items key

Fixes

  • harden GLM 4.7 parser recovery for malformed nested tool-call blocks and valid-name prefixes
  • strip GLM reasoning / tool-call markup from visible Responses output and salvage reasoning into structured form
  • dedupe mixed raw/structured tool-call streaming in the Responses stream converter
  • normalize the template formatter so Python .items() patterns become valid Jinja |items

Validation

  • cargo test -p dynamo-parsers glm47_parser -- --nocapture
  • cargo test -p dynamo-llm protocols::openai::responses:: -- --nocapture

Summary by CodeRabbit

  • New Features

    • Surface hidden reasoning/thinking content as distinct outputs.
    • Emit incremental tool-call events from raw streamed text when structured calls are absent.
  • Bug Fixes

    • Avoid duplicate tool-call emissions when structured and raw tool-call data interleave.
    • Strip dangling/embedded thinking markers from visible messages.
    • Recover and parse malformed or nested tool-call fragments robustly.
    • Improve template expression compatibility (better handling of dictionary .items usage).

@ishandhanani ishandhanani requested a review from a team as a code owner March 27, 2026 13:45
@github-actions github-actions Bot added the fix label Mar 27, 2026
@github-actions github-actions Bot added the frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` label Mar 27, 2026
@ishandhanani
Copy link
Copy Markdown
Contributor Author

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 986a883a-0e49-45e2-a56d-a4ee5d27b306

📥 Commits

Reviewing files that changed from the base of the PR and between 19e39c6 and 533f813.

📒 Files selected for processing (2)
  • lib/llm/src/protocols/openai/responses/mod.rs
  • lib/llm/src/protocols/openai/responses/stream_converter.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/llm/src/protocols/openai/responses/stream_converter.rs

Walkthrough

Adds a .items()|items Jinja rewrite in template preprocessing; extracts and hides <think> blocks as structured Reasoning while preventing duplicate tool-call outputs; implements fallback parsing of raw <tool_call>...</tool_call> in streaming and coordinates with structured tool calls; improves XML tool-call parser recovery for malformed or nested blocks.

Changes

Cohort / File(s) Summary
Template Formatting
lib/llm/src/preprocessor/prompt/template/formatters.rs
Added private rewrite_python_dict_items_calls(template: &str) -> String to replace .items() with `
OpenAI Response Handling
lib/llm/src/protocols/openai/responses/mod.rs
Added extract_think_blocks() and extended strip_tool_call_text() to handle <think> and dangling </think>; chat_completion_to_response now derives Reasoning from think blocks when absent, avoids emitting whitespace-only Messages, and disables raw <tool_call> parsing when structured tool calls exist. Unit tests added/updated.
OpenAI Streaming Converter
lib/llm/src/protocols/openai/responses/stream_converter.rs
Introduced raw_text and flags to track structured vs fallback tool-call emission; refactored message emission helpers, emit only newly-visible text deltas after stripping markup, implement incremental fallback parsing of raw <tool_call> blocks, and coordinate with structured tool calls to prevent duplicates. Tests expanded and shifted to JSON fixtures.
Tool Call XML Parser
lib/parsers/src/tool_calling/xml/glm47_parser.rs
Changed normal_parts to own Strings; on parse failures attempt recovery by locating and reparsing nested <tool_call> blocks; added recover_nested_tool_calls() and recover_malformed_function_name_prefix() to salvage malformed function-name prefixes and reject angle-bracket names. Added tests for nested/malformed recovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main changes: improving GLM 4.7 responses handling for Codex, which aligns with the primary objectives of fixing parser robustness and Responses API behavior.
Description check ✅ Passed The description comprehensively covers all required sections: Summary, Issues Found, Fixes, and Validation. It matches the template structure (Overview, Details, Where to start) and provides clear, actionable information about the changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/responses/mod.rs (1)

758-768: ⚠️ Potential issue | 🟠 Major

Skip the raw tool-call fallback when structured tool_calls already exist.

This path emits structured function calls first and then unconditionally re-parses content for <tool_call>...</tool_call>. On mixed outputs, /v1/responses will contain duplicated FunctionCall items even though the streaming converter now suppresses that case.

Minimal fix
-        if let Some(tool_calls) = choice.message.tool_calls {
+        let has_structured_tool_calls = choice
+            .message
+            .tool_calls
+            .as_ref()
+            .is_some_and(|tool_calls| !tool_calls.is_empty());
+        if let Some(tool_calls) = choice.message.tool_calls {
             for tc in &tool_calls {
                 output.push(OutputItem::FunctionCall(FunctionToolCall {
                     arguments: tc.function.arguments.clone(),
                     call_id: tc.id.clone(),
                     name: tc.function.name.clone(),
                     id: Some(format!("fc_{}", Uuid::new_v4().simple())),
                     status: Some(OutputStatus::Completed),
                 }));
             }
         }
@@
-            let parsed_calls = parse_tool_call_text(&content_text);
+            let parsed_calls = if has_structured_tool_calls {
+                Vec::new()
+            } else {
+                parse_tool_call_text(&content_text)
+            };

Also applies to: 814-833

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/mod.rs` around lines 758 - 768, The
code currently emits structured FunctionCall items from
choice.message.tool_calls and then always falls back to re-parsing the raw
content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output;
update the logic in the block that pushes OutputItem::FunctionCall (where
choice.message.tool_calls is handled and FunctionToolCall is constructed) to
skip the raw-content parsing path when choice.message.tool_calls is Some and
non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is
None or empty), and apply the same conditional fix to the similar parsing block
around the other occurrence (the block covering the 814-833 logic) so duplicates
are prevented.
🧹 Nitpick comments (1)
lib/llm/src/preprocessor/prompt/template/formatters.rs (1)

59-61: Naive string replacement may affect string literals.

The simple .replace() approach will also rewrite .items() inside string literals (e.g., {{ "Call .items() on dict" }}). This is unlikely in real templates, but for robustness you could consider a regex that excludes quoted contexts.

Given that this pattern mirrors the existing remove_known_non_jinja2_tags approach and false positives are rare in practice, this is acceptable as-is.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/preprocessor/prompt/template/formatters.rs` around lines 59 - 61,
The current rewrite_python_dict_items_calls function uses a naive
template.replace(".items()", "|items") which will incorrectly modify occurrences
inside string literals; update rewrite_python_dict_items_calls to perform a
regex-based replacement that only matches .items() outside of single- or
double-quoted strings (or reuse the same approach used by
remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on
dict" }} are left untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 237-286: finish_message_if_started currently emits "text done" /
"part done" / "item done" events and then clears the message state, which causes
finished text-only items to be lost when emit_end_events later builds
Response.output; change the logic so finish_message_if_started still emits the
SSE events but also pushes the finished OutputItem (the same item created in the
item_done block) into a persistent completed-items buffer (e.g., a Vec field
like completed_items or completed_output_items) instead of discarding it, and do
not drop that buffer when clearing transient fields (message_started,
message_item_id, message_output_index, accumulated_text); then update
emit_end_events to construct Response.output from the completed-items buffer
plus any currently open message state so response.completed.response.output
contains all finished items and any in-progress content.
- Around line 151-175: emit_visible_text_delta currently assumes
self.accumulated_text is a strict prefix of visible_text; replace the
strip_prefix logic with computing the longest common prefix (LCP) between
self.accumulated_text and visible_text, return early if they are identical, then
set delta to visible_text[LCP_len..], update self.accumulated_text =
visible_text.to_string(), and emit that delta as before (preserving next_seq(),
message_item_id, etc.). Apply the same LCP-based fix to the analogous projection
logic noted around the 418-440 region so retroactive visibility changes (e.g.
previously hidden markup becoming visible) correctly produce the remaining
visible suffix instead of leaking the old prefix.
- Around line 101-110: The current equality check in append_raw_text drops
legitimate repeated chunks; remove the `content == self.raw_text` shortcut and
instead treat empty strings as the only no-op, then decide between a cumulative
snapshot vs a delta by comparing lengths: if content.starts_with(&self.raw_text)
and content.len() > self.raw_text.len() then replace self.raw_text with content
(treat as cumulative), otherwise push_str(content) (treat as appended/delta) so
identical successive chunks are preserved; update the function append_raw_text
and its use of the raw_text field accordingly.

In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 100-106: The current branch that handles Some((recovered_text, mut
recovered_calls)) from recover_nested_tool_calls(block, config, tools)? discards
the prefix before the nested_start, losing malformed prefix text; fix by
prepending the original prefix slice (&block[..nested_start]) to recovered_text
before pushing into normal_parts and then append recovered_calls to calls (do
the same fix for the other identical block around the second occurrence handling
lines 163-167). Specifically modify the code handling the tuple returned by
recover_nested_tool_calls in glm47_parser.rs so that
normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or
equivalent string/Vec concatenation is used instead of just pushing
recovered_text, then calls.append(&mut recovered_calls).

---

Outside diff comments:
In `@lib/llm/src/protocols/openai/responses/mod.rs`:
- Around line 758-768: The code currently emits structured FunctionCall items
from choice.message.tool_calls and then always falls back to re-parsing the raw
content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output;
update the logic in the block that pushes OutputItem::FunctionCall (where
choice.message.tool_calls is handled and FunctionToolCall is constructed) to
skip the raw-content parsing path when choice.message.tool_calls is Some and
non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is
None or empty), and apply the same conditional fix to the similar parsing block
around the other occurrence (the block covering the 814-833 logic) so duplicates
are prevented.

---

Nitpick comments:
In `@lib/llm/src/preprocessor/prompt/template/formatters.rs`:
- Around line 59-61: The current rewrite_python_dict_items_calls function uses a
naive template.replace(".items()", "|items") which will incorrectly modify
occurrences inside string literals; update rewrite_python_dict_items_calls to
perform a regex-based replacement that only matches .items() outside of single-
or double-quoted strings (or reuse the same approach used by
remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on
dict" }} are left untouched.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ff97f06-2cdc-4092-82a6-80ced20b7679

📥 Commits

Reviewing files that changed from the base of the PR and between 310f8ca and 02a21ac.

📒 Files selected for processing (4)
  • lib/llm/src/preprocessor/prompt/template/formatters.rs
  • lib/llm/src/protocols/openai/responses/mod.rs
  • lib/llm/src/protocols/openai/responses/stream_converter.rs
  • lib/parsers/src/tool_calling/xml/glm47_parser.rs

Comment thread lib/llm/src/protocols/openai/responses/stream_converter.rs
Comment thread lib/llm/src/protocols/openai/responses/stream_converter.rs
Comment on lines +237 to +286
fn finish_message_if_started(&mut self, events: &mut Vec<Result<Event, anyhow::Error>>) {
if !self.message_started {
return;
}

let text_done = ResponseStreamEvent::ResponseOutputTextDone(ResponseTextDoneEvent {
sequence_number: self.next_seq(),
item_id: self.message_item_id.clone(),
output_index: self.message_output_index,
content_index: 0,
text: self.accumulated_text.clone(),
logprobs: Some(vec![]),
});
events.push(make_sse_event(&text_done));

let part_done =
ResponseStreamEvent::ResponseContentPartDone(ResponseContentPartDoneEvent {
sequence_number: self.next_seq(),
item_id: self.message_item_id.clone(),
output_index: self.message_output_index,
content_index: 0,
part: OutputContent::OutputText(OutputTextContent {
text: self.accumulated_text.clone(),
annotations: vec![],
logprobs: Some(vec![]),
}),
});
events.push(make_sse_event(&part_done));

let item_done = ResponseStreamEvent::ResponseOutputItemDone(ResponseOutputItemDoneEvent {
sequence_number: self.next_seq(),
output_index: self.message_output_index,
item: OutputItem::Message(OutputMessage {
id: Some(self.message_item_id.clone()),
content: vec![OutputMessageContent::OutputText(OutputTextContent {
text: self.accumulated_text.clone(),
annotations: vec![],
logprobs: Some(vec![]),
})],
role: AssistantRole::Assistant,
status: Some(OutputStatus::Completed),
}),
});
events.push(make_sse_event(&item_done));

self.message_started = false;
self.message_item_id = format!("msg_{}", Uuid::new_v4().simple());
self.message_output_index = 0;
self.accumulated_text.clear();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Completed text items are dropped from the terminal response.completed.

finish_message_if_started() emits the text/item done events and then clears the message state. emit_end_events() calls it before building Response.output, so text-only streams—and any text finished before a tool call—disappear from response.completed.response.output. Keep finished message items in a completed-items buffer and build the final response from that buffer plus any still-open state.

Also applies to: 589-649

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 237
- 286, finish_message_if_started currently emits "text done" / "part done" /
"item done" events and then clears the message state, which causes finished
text-only items to be lost when emit_end_events later builds Response.output;
change the logic so finish_message_if_started still emits the SSE events but
also pushes the finished OutputItem (the same item created in the item_done
block) into a persistent completed-items buffer (e.g., a Vec field like
completed_items or completed_output_items) instead of discarding it, and do not
drop that buffer when clearing transient fields (message_started,
message_item_id, message_output_index, accumulated_text); then update
emit_end_events to construct Response.output from the completed-items buffer
plus any currently open message state so response.completed.response.output
contains all finished items and any in-progress content.

Comment on lines +100 to +106
if let Some((recovered_text, mut recovered_calls)) =
recover_nested_tool_calls(block, config, tools)?
{
if !recovered_text.is_empty() {
normal_parts.push(recovered_text);
}
calls.append(&mut recovered_calls);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Nested recovery still drops the malformed prefix.

When recover_nested_tool_calls() succeeds, only &block[nested_start..] is re-parsed. Everything before that nested start is discarded instead of being preserved as normal text, so malformed wrappers can still lose visible model output even though this path is supposed to avoid silent drops. Prepend &block[..nested_start] to the recovered normal text before returning.

Also applies to: 163-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs` around lines 100 - 106, The
current branch that handles Some((recovered_text, mut recovered_calls)) from
recover_nested_tool_calls(block, config, tools)? discards the prefix before the
nested_start, losing malformed prefix text; fix by prepending the original
prefix slice (&block[..nested_start]) to recovered_text before pushing into
normal_parts and then append recovered_calls to calls (do the same fix for the
other identical block around the second occurrence handling lines 163-167).
Specifically modify the code handling the tuple returned by
recover_nested_tool_calls in glm47_parser.rs so that
normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or
equivalent string/Vec concatenation is used instead of just pushing
recovered_text, then calls.append(&mut recovered_calls).

…ing tool calls

Two bugs addressed:

1. Streaming: finish_message_if_started() clears message_started and
   accumulated_text, so the subsequent guard in emit_end_events() that
   builds Response.output always evaluated false. Text-only streams got
   an empty output array in the response.completed event. Fix: capture
   message state before calling finish.

2. Non-streaming: chat_completion_to_response() emitted structured
   tool_calls first, then unconditionally re-parsed content for raw
   <tool_call> blocks, producing duplicates when both were present.
   Fix: skip raw parsing when structured tool_calls already exist.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
@MatejKosec
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
lib/llm/src/protocols/openai/responses/stream_converter.rs (2)

101-111: ⚠️ Potential issue | 🟠 Major

append_raw_text() still drops legitimate repeated chunks.

The content == self.raw_text check on line 102 will incorrectly drop consecutive identical delta chunks (e.g., two " " chunks). After the first " " is appended, self.raw_text == " ", so the second " " chunk triggers the early return.

Consider removing the equality check and only return early for empty content:

Proposed fix
 fn append_raw_text(&mut self, content: &str) {
-    if content.is_empty() || content == self.raw_text {
+    if content.is_empty() {
         return;
     }

     if content.starts_with(&self.raw_text) {
         self.raw_text = content.to_string();
     } else {
         self.raw_text.push_str(content);
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 101
- 111, The append_raw_text method drops legitimate repeated identical chunks
because of the equality check; change append_raw_text(&mut self, content: &str)
to only early-return when content.is_empty(), removing the content ==
self.raw_text check so consecutive identical delta chunks (e.g., repeated " ")
are appended; keep the existing starts_with branch (self.raw_text =
content.to_string()) and the push_str fallback as-is to preserve incremental
merging behavior.

151-175: ⚠️ Potential issue | 🟠 Major

Visible text projection can regress across chunk boundaries.

The strip_prefix approach assumes visible_text always starts with accumulated_text, but later bytes can retroactively hide earlier output. For example:

  1. Chunk 1: "<tool"strip_tool_call_text returns "<tool" (incomplete tag, visible) → accumulated_text = "<tool"
  2. Chunk 2: "<tool_call>{...}</tool_call>Done"strip_tool_call_text returns "Done"
  3. "Done".strip_prefix("<tool") fails → delta not emitted

Consider computing the longest common prefix (LCP) between accumulated_text and visible_text, then emitting visible_text[lcp_len..] as the delta. This handles retroactive visibility changes gracefully.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 151
- 175, emit_visible_text_delta currently uses
visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks
retroactively hide earlier output; change it to compute the longest common
prefix (LCP) length between self.accumulated_text and visible_text, then treat
the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call
ensure_message_started, update self.accumulated_text = visible_text.to_string(),
build the ResponseTextDeltaEvent (using next_seq(), message_item_id,
message_output_index, content_index 0, delta string, logprobs Some(vec![])) and
push the SSE event as before so retroactive visibility changes emit the correct
suffix.
🧹 Nitpick comments (1)
lib/llm/src/protocols/openai/responses/stream_converter.rs (1)

1087-1099: Consider adding a multi-chunk test for partial </think> markup.

This test covers the single-chunk case. Given the strip_prefix limitation noted earlier, a multi-chunk test where </think> arrives split across chunks (e.g., "</thi" then "nk>Visible") would help verify edge cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 1087
- 1099, Add a new test that simulates a multi-chunk scenario for a partial
"</think>" split across chunks to ensure the strip_prefix logic in
ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter
via ResponseStreamConverter::new(...) and call emit_start_events(), then call
process_chunk twice (first with a chunk ending in a partial tag like "private
reasoning</thi" and second with the remainder "nk>Visible answer.") and assert
that the visible delta event ("response.output_text.delta") is emitted and
conv.accumulated_text equals "Visible answer."; this mirrors
test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to
validate handling of split "</think>" across process_chunk calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 101-111: The append_raw_text method drops legitimate repeated
identical chunks because of the equality check; change append_raw_text(&mut
self, content: &str) to only early-return when content.is_empty(), removing the
content == self.raw_text check so consecutive identical delta chunks (e.g.,
repeated " ") are appended; keep the existing starts_with branch (self.raw_text
= content.to_string()) and the push_str fallback as-is to preserve incremental
merging behavior.
- Around line 151-175: emit_visible_text_delta currently uses
visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks
retroactively hide earlier output; change it to compute the longest common
prefix (LCP) length between self.accumulated_text and visible_text, then treat
the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call
ensure_message_started, update self.accumulated_text = visible_text.to_string(),
build the ResponseTextDeltaEvent (using next_seq(), message_item_id,
message_output_index, content_index 0, delta string, logprobs Some(vec![])) and
push the SSE event as before so retroactive visibility changes emit the correct
suffix.

---

Nitpick comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 1087-1099: Add a new test that simulates a multi-chunk scenario
for a partial "</think>" split across chunks to ensure the strip_prefix logic in
ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter
via ResponseStreamConverter::new(...) and call emit_start_events(), then call
process_chunk twice (first with a chunk ending in a partial tag like "private
reasoning</thi" and second with the remainder "nk>Visible answer.") and assert
that the visible delta event ("response.output_text.delta") is emitted and
conv.accumulated_text equals "Visible answer."; this mirrors
test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to
validate handling of split "</think>" across process_chunk calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 42ac10e6-1196-4cc2-9f80-95f13ec665d9

📥 Commits

Reviewing files that changed from the base of the PR and between 02a21ac and 19e39c6.

📒 Files selected for processing (2)
  • lib/llm/src/protocols/openai/responses/mod.rs
  • lib/llm/src/protocols/openai/responses/stream_converter.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/llm/src/protocols/openai/responses/mod.rs

Two bugs addressed:

1. strip_tag arm ordering: when </think> appeared before <think>
   (dangling close followed by a matched pair), the (Some(start), _)
   arm fired instead of the dangling-close arm, leaking the </think>
   tag and truncating everything after the <think> open. Fix: reorder
   match arms so dangling-close check runs before open-without-close.

2. emit_visible_text_delta prefix assumption: when retroactive stripping
   (e.g. </think> completing across chunks) invalidated already-emitted
   text, strip_prefix returned None and accumulated_text was never
   updated, permanently blocking all future deltas. Fix: fall back to
   longest common prefix computation so subsequent text is still emitted.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
@MatejKosec
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants