fix: improve GLM 4.7 responses handling for Codex by ishandhanani · Pull Request #7666 · ai-dynamo/dynamo

ishandhanani · 2026-03-27T13:45:07Z

Summary

add the GLM 4.7 parser and Responses API fixes needed for Codex on /v1/responses
keep this PR scoped to GLM/Responses behavior only

Issues Found

GLM malformed or nested <tool_call> blocks could confuse parsing and cause Codex tool calls to be dropped or recovered inconsistently
Responses conversion could leak GLM reasoning / markup into visible assistant text instead of keeping it hidden or structured
Mixed tool-call streaming could emit the same tool call twice when a model produced both raw <tool_call> text and structured tool_calls deltas
GLM chat templating could break on tool arguments that contain an items key

Fixes

harden GLM 4.7 parser recovery for malformed nested tool-call blocks and valid-name prefixes
strip GLM reasoning / tool-call markup from visible Responses output and salvage reasoning into structured form
dedupe mixed raw/structured tool-call streaming in the Responses stream converter
normalize the template formatter so Python .items() patterns become valid Jinja |items

Validation

cargo test -p dynamo-parsers glm47_parser -- --nocapture
cargo test -p dynamo-llm protocols::openai::responses:: -- --nocapture

Summary by CodeRabbit

New Features
- Surface hidden reasoning/thinking content as distinct outputs.
- Emit incremental tool-call events from raw streamed text when structured calls are absent.
Bug Fixes
- Avoid duplicate tool-call emissions when structured and raw tool-call data interleave.
- Strip dangling/embedded thinking markers from visible messages.
- Recover and parse malformed or nested tool-call fragments robustly.
- Improve template expression compatibility (better handling of dictionary .items usage).

ishandhanani · 2026-03-27T13:51:08Z

https://app.devin.ai/review/ai-dynamo/dynamo/pull/7666

coderabbitai · 2026-03-27T14:01:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 986a883a-0e49-45e2-a56d-a4ee5d27b306

📥 Commits

Reviewing files that changed from the base of the PR and between 19e39c6 and 533f813.

📒 Files selected for processing (2)

lib/llm/src/protocols/openai/responses/mod.rs
lib/llm/src/protocols/openai/responses/stream_converter.rs

🚧 Files skipped from review as they are similar to previous changes (1)

lib/llm/src/protocols/openai/responses/stream_converter.rs

Walkthrough

Adds a .items() → |items Jinja rewrite in template preprocessing; extracts and hides <think> blocks as structured Reasoning while preventing duplicate tool-call outputs; implements fallback parsing of raw <tool_call>...</tool_call> in streaming and coordinates with structured tool calls; improves XML tool-call parser recovery for malformed or nested blocks.

Changes

Cohort / File(s)	Summary
Template Formatting `lib/llm/src/preprocessor/prompt/template/formatters.rs`	Added private `rewrite_python_dict_items_calls(template: &str) -> String` to replace `.items()` with `
OpenAI Response Handling `lib/llm/src/protocols/openai/responses/mod.rs`	Added `extract_think_blocks()` and extended `strip_tool_call_text()` to handle `<think>` and dangling `</think>`; `chat_completion_to_response` now derives Reasoning from think blocks when absent, avoids emitting whitespace-only Messages, and disables raw `<tool_call>` parsing when structured tool calls exist. Unit tests added/updated.
OpenAI Streaming Converter `lib/llm/src/protocols/openai/responses/stream_converter.rs`	Introduced `raw_text` and flags to track structured vs fallback tool-call emission; refactored message emission helpers, emit only newly-visible text deltas after stripping markup, implement incremental fallback parsing of raw `<tool_call>` blocks, and coordinate with structured tool calls to prevent duplicates. Tests expanded and shifted to JSON fixtures.
Tool Call XML Parser `lib/parsers/src/tool_calling/xml/glm47_parser.rs`	Changed `normal_parts` to own `String`s; on parse failures attempt recovery by locating and reparsing nested `<tool_call>` blocks; added `recover_nested_tool_calls()` and `recover_malformed_function_name_prefix()` to salvage malformed function-name prefixes and reject angle-bracket names. Added tests for nested/malformed recovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main changes: improving GLM 4.7 responses handling for Codex, which aligns with the primary objectives of fixing parser robustness and Responses API behavior.
Description check	✅ Passed	The description comprehensively covers all required sections: Summary, Issues Found, Fixes, and Validation. It matches the template structure (Overview, Details, Where to start) and provides clear, actionable information about the changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/protocols/openai/responses/mod.rs (1)

758-768: ⚠️ Potential issue | 🟠 Major

Skip the raw tool-call fallback when structured tool_calls already exist.

This path emits structured function calls first and then unconditionally re-parses content for <tool_call>...</tool_call>. On mixed outputs, /v1/responses will contain duplicated FunctionCall items even though the streaming converter now suppresses that case.

Minimal fix

-        if let Some(tool_calls) = choice.message.tool_calls {
+        let has_structured_tool_calls = choice
+            .message
+            .tool_calls
+            .as_ref()
+            .is_some_and(|tool_calls| !tool_calls.is_empty());
+        if let Some(tool_calls) = choice.message.tool_calls {
             for tc in &tool_calls {
                 output.push(OutputItem::FunctionCall(FunctionToolCall {
                     arguments: tc.function.arguments.clone(),
                     call_id: tc.id.clone(),
                     name: tc.function.name.clone(),
                     id: Some(format!("fc_{}", Uuid::new_v4().simple())),
                     status: Some(OutputStatus::Completed),
                 }));
             }
         }
@@
-            let parsed_calls = parse_tool_call_text(&content_text);
+            let parsed_calls = if has_structured_tool_calls {
+                Vec::new()
+            } else {
+                parse_tool_call_text(&content_text)
+            };

Also applies to: 814-833

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/mod.rs` around lines 758 - 768, The
code currently emits structured FunctionCall items from
choice.message.tool_calls and then always falls back to re-parsing the raw
content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output;
update the logic in the block that pushes OutputItem::FunctionCall (where
choice.message.tool_calls is handled and FunctionToolCall is constructed) to
skip the raw-content parsing path when choice.message.tool_calls is Some and
non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is
None or empty), and apply the same conditional fix to the similar parsing block
around the other occurrence (the block covering the 814-833 logic) so duplicates
are prevented.

🧹 Nitpick comments (1)

lib/llm/src/preprocessor/prompt/template/formatters.rs (1)
59-61: Naive string replacement may affect string literals.

The simple .replace() approach will also rewrite .items() inside string literals (e.g., {{ "Call .items() on dict" }}). This is unlikely in real templates, but for robustness you could consider a regex that excludes quoted contexts.

Given that this pattern mirrors the existing remove_known_non_jinja2_tags approach and false positives are rare in practice, this is acceptable as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/preprocessor/prompt/template/formatters.rs` around lines 59 - 61,
The current rewrite_python_dict_items_calls function uses a naive
template.replace(".items()", "|items") which will incorrectly modify occurrences
inside string literals; update rewrite_python_dict_items_calls to perform a
regex-based replacement that only matches .items() outside of single- or
double-quoted strings (or reuse the same approach used by
remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on
dict" }} are left untouched.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 237-286: finish_message_if_started currently emits "text done" /
"part done" / "item done" events and then clears the message state, which causes
finished text-only items to be lost when emit_end_events later builds
Response.output; change the logic so finish_message_if_started still emits the
SSE events but also pushes the finished OutputItem (the same item created in the
item_done block) into a persistent completed-items buffer (e.g., a Vec field
like completed_items or completed_output_items) instead of discarding it, and do
not drop that buffer when clearing transient fields (message_started,
message_item_id, message_output_index, accumulated_text); then update
emit_end_events to construct Response.output from the completed-items buffer
plus any currently open message state so response.completed.response.output
contains all finished items and any in-progress content.
- Around line 151-175: emit_visible_text_delta currently assumes
self.accumulated_text is a strict prefix of visible_text; replace the
strip_prefix logic with computing the longest common prefix (LCP) between
self.accumulated_text and visible_text, return early if they are identical, then
set delta to visible_text[LCP_len..], update self.accumulated_text =
visible_text.to_string(), and emit that delta as before (preserving next_seq(),
message_item_id, etc.). Apply the same LCP-based fix to the analogous projection
logic noted around the 418-440 region so retroactive visibility changes (e.g.
previously hidden markup becoming visible) correctly produce the remaining
visible suffix instead of leaking the old prefix.
- Around line 101-110: The current equality check in append_raw_text drops
legitimate repeated chunks; remove the `content == self.raw_text` shortcut and
instead treat empty strings as the only no-op, then decide between a cumulative
snapshot vs a delta by comparing lengths: if content.starts_with(&self.raw_text)
and content.len() > self.raw_text.len() then replace self.raw_text with content
(treat as cumulative), otherwise push_str(content) (treat as appended/delta) so
identical successive chunks are preserved; update the function append_raw_text
and its use of the raw_text field accordingly.

In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 100-106: The current branch that handles Some((recovered_text, mut
recovered_calls)) from recover_nested_tool_calls(block, config, tools)? discards
the prefix before the nested_start, losing malformed prefix text; fix by
prepending the original prefix slice (&block[..nested_start]) to recovered_text
before pushing into normal_parts and then append recovered_calls to calls (do
the same fix for the other identical block around the second occurrence handling
lines 163-167). Specifically modify the code handling the tuple returned by
recover_nested_tool_calls in glm47_parser.rs so that
normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or
equivalent string/Vec concatenation is used instead of just pushing
recovered_text, then calls.append(&mut recovered_calls).

---

Outside diff comments:
In `@lib/llm/src/protocols/openai/responses/mod.rs`:
- Around line 758-768: The code currently emits structured FunctionCall items
from choice.message.tool_calls and then always falls back to re-parsing the raw
content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output;
update the logic in the block that pushes OutputItem::FunctionCall (where
choice.message.tool_calls is handled and FunctionToolCall is constructed) to
skip the raw-content parsing path when choice.message.tool_calls is Some and
non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is
None or empty), and apply the same conditional fix to the similar parsing block
around the other occurrence (the block covering the 814-833 logic) so duplicates
are prevented.

---

Nitpick comments:
In `@lib/llm/src/preprocessor/prompt/template/formatters.rs`:
- Around line 59-61: The current rewrite_python_dict_items_calls function uses a
naive template.replace(".items()", "|items") which will incorrectly modify
occurrences inside string literals; update rewrite_python_dict_items_calls to
perform a regex-based replacement that only matches .items() outside of single-
or double-quoted strings (or reuse the same approach used by
remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on
dict" }} are left untouched.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ff97f06-2cdc-4092-82a6-80ced20b7679

📥 Commits

Reviewing files that changed from the base of the PR and between 310f8ca and 02a21ac.

📒 Files selected for processing (4)

lib/llm/src/preprocessor/prompt/template/formatters.rs
lib/llm/src/protocols/openai/responses/mod.rs
lib/llm/src/protocols/openai/responses/stream_converter.rs
lib/parsers/src/tool_calling/xml/glm47_parser.rs

coderabbitai · 2026-03-27T14:01:23Z

+    fn finish_message_if_started(&mut self, events: &mut Vec<Result<Event, anyhow::Error>>) {
+        if !self.message_started {
+            return;
+        }
+
+        let text_done = ResponseStreamEvent::ResponseOutputTextDone(ResponseTextDoneEvent {
+            sequence_number: self.next_seq(),
+            item_id: self.message_item_id.clone(),
+            output_index: self.message_output_index,
+            content_index: 0,
+            text: self.accumulated_text.clone(),
+            logprobs: Some(vec![]),
+        });
+        events.push(make_sse_event(&text_done));
+
+        let part_done =
+            ResponseStreamEvent::ResponseContentPartDone(ResponseContentPartDoneEvent {
+                sequence_number: self.next_seq(),
+                item_id: self.message_item_id.clone(),
+                output_index: self.message_output_index,
+                content_index: 0,
+                part: OutputContent::OutputText(OutputTextContent {
+                    text: self.accumulated_text.clone(),
+                    annotations: vec![],
+                    logprobs: Some(vec![]),
+                }),
+            });
+        events.push(make_sse_event(&part_done));
+
+        let item_done = ResponseStreamEvent::ResponseOutputItemDone(ResponseOutputItemDoneEvent {
+            sequence_number: self.next_seq(),
+            output_index: self.message_output_index,
+            item: OutputItem::Message(OutputMessage {
+                id: Some(self.message_item_id.clone()),
+                content: vec![OutputMessageContent::OutputText(OutputTextContent {
+                    text: self.accumulated_text.clone(),
+                    annotations: vec![],
+                    logprobs: Some(vec![]),
+                })],
+                role: AssistantRole::Assistant,
+                status: Some(OutputStatus::Completed),
+            }),
+        });
+        events.push(make_sse_event(&item_done));
+
+        self.message_started = false;
+        self.message_item_id = format!("msg_{}", Uuid::new_v4().simple());
+        self.message_output_index = 0;
+        self.accumulated_text.clear();
+    }


⚠️ Potential issue | 🔴 Critical

Completed text items are dropped from the terminal response.completed.

finish_message_if_started() emits the text/item done events and then clears the message state. emit_end_events() calls it before building Response.output, so text-only streams—and any text finished before a tool call—disappear from response.completed.response.output. Keep finished message items in a completed-items buffer and build the final response from that buffer plus any still-open state.

Also applies to: 589-649

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 237 - 286, finish_message_if_started currently emits "text done" / "part done" / "item done" events and then clears the message state, which causes finished text-only items to be lost when emit_end_events later builds Response.output; change the logic so finish_message_if_started still emits the SSE events but also pushes the finished OutputItem (the same item created in the item_done block) into a persistent completed-items buffer (e.g., a Vec field like completed_items or completed_output_items) instead of discarding it, and do not drop that buffer when clearing transient fields (message_started, message_item_id, message_output_index, accumulated_text); then update emit_end_events to construct Response.output from the completed-items buffer plus any currently open message state so response.completed.response.output contains all finished items and any in-progress content.

coderabbitai · 2026-03-27T14:01:23Z

+                        if let Some((recovered_text, mut recovered_calls)) =
+                            recover_nested_tool_calls(block, config, tools)?
+                        {
+                            if !recovered_text.is_empty() {
+                                normal_parts.push(recovered_text);
+                            }
+                            calls.append(&mut recovered_calls);


⚠️ Potential issue | 🟡 Minor

Nested recovery still drops the malformed prefix.

When recover_nested_tool_calls() succeeds, only &block[nested_start..] is re-parsed. Everything before that nested start is discarded instead of being preserved as normal text, so malformed wrappers can still lose visible model output even though this path is supposed to avoid silent drops. Prepend &block[..nested_start] to the recovered normal text before returning.

Also applies to: 163-167

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs` around lines 100 - 106, The current branch that handles Some((recovered_text, mut recovered_calls)) from recover_nested_tool_calls(block, config, tools)? discards the prefix before the nested_start, losing malformed prefix text; fix by prepending the original prefix slice (&block[..nested_start]) to recovered_text before pushing into normal_parts and then append recovered_calls to calls (do the same fix for the other identical block around the second occurrence handling lines 163-167). Specifically modify the code handling the tuple returned by recover_nested_tool_calls in glm47_parser.rs so that normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or equivalent string/Vec concatenation is used instead of just pushing recovered_text, then calls.append(&mut recovered_calls).

…ing tool calls Two bugs addressed: 1. Streaming: finish_message_if_started() clears message_started and accumulated_text, so the subsequent guard in emit_end_events() that builds Response.output always evaluated false. Text-only streams got an empty output array in the response.completed event. Fix: capture message state before calling finish. 2. Non-streaming: chat_completion_to_response() emitted structured tool_calls first, then unconditionally re-parsed content for raw <tool_call> blocks, producing duplicates when both were present. Fix: skip raw parsing when structured tool_calls already exist. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec · 2026-03-29T03:25:36Z

@coderabbitai review

coderabbitai · 2026-03-29T03:25:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (2)

lib/llm/src/protocols/openai/responses/stream_converter.rs (2)
101-111: ⚠️ Potential issue | 🟠 Major

append_raw_text() still drops legitimate repeated chunks.

The content == self.raw_text check on line 102 will incorrectly drop consecutive identical delta chunks (e.g., two " " chunks). After the first " " is appended, self.raw_text == " ", so the second " " chunk triggers the early return.

Consider removing the equality check and only return early for empty content:
Proposed fix
 fn append_raw_text(&mut self, content: &str) {
-    if content.is_empty() || content == self.raw_text {
+    if content.is_empty() {
         return;
     }

     if content.starts_with(&self.raw_text) {
         self.raw_text = content.to_string();
     } else {
         self.raw_text.push_str(content);
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 101
- 111, The append_raw_text method drops legitimate repeated identical chunks
because of the equality check; change append_raw_text(&mut self, content: &str)
to only early-return when content.is_empty(), removing the content ==
self.raw_text check so consecutive identical delta chunks (e.g., repeated " ")
are appended; keep the existing starts_with branch (self.raw_text =
content.to_string()) and the push_str fallback as-is to preserve incremental
merging behavior.
151-175: ⚠️ Potential issue | 🟠 Major

Visible text projection can regress across chunk boundaries.

The strip_prefix approach assumes visible_text always starts with accumulated_text, but later bytes can retroactively hide earlier output. For example:

Chunk 1: "<tool" → strip_tool_call_text returns "<tool" (incomplete tag, visible) → accumulated_text = "<tool"

Chunk 2: "<tool_call>{...}</tool_call>Done" → strip_tool_call_text returns "Done"

"Done".strip_prefix("<tool") fails → delta not emitted

Consider computing the longest common prefix (LCP) between accumulated_text and visible_text, then emitting visible_text[lcp_len..] as the delta. This handles retroactive visibility changes gracefully.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 151
- 175, emit_visible_text_delta currently uses
visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks
retroactively hide earlier output; change it to compute the longest common
prefix (LCP) length between self.accumulated_text and visible_text, then treat
the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call
ensure_message_started, update self.accumulated_text = visible_text.to_string(),
build the ResponseTextDeltaEvent (using next_seq(), message_item_id,
message_output_index, content_index 0, delta string, logprobs Some(vec![])) and
push the SSE event as before so retroactive visibility changes emit the correct
suffix.

🧹 Nitpick comments (1)

lib/llm/src/protocols/openai/responses/stream_converter.rs (1)

1087-1099: Consider adding a multi-chunk test for partial </think> markup.

This test covers the single-chunk case. Given the strip_prefix limitation noted earlier, a multi-chunk test where </think> arrives split across chunks (e.g., "</thi" then "nk>Visible") would help verify edge cases.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 1087
- 1099, Add a new test that simulates a multi-chunk scenario for a partial
"</think>" split across chunks to ensure the strip_prefix logic in
ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter
via ResponseStreamConverter::new(...) and call emit_start_events(), then call
process_chunk twice (first with a chunk ending in a partial tag like "private
reasoning</thi" and second with the remainder "nk>Visible answer.") and assert
that the visible delta event ("response.output_text.delta") is emitted and
conv.accumulated_text equals "Visible answer."; this mirrors
test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to
validate handling of split "</think>" across process_chunk calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 101-111: The append_raw_text method drops legitimate repeated
identical chunks because of the equality check; change append_raw_text(&mut
self, content: &str) to only early-return when content.is_empty(), removing the
content == self.raw_text check so consecutive identical delta chunks (e.g.,
repeated " ") are appended; keep the existing starts_with branch (self.raw_text
= content.to_string()) and the push_str fallback as-is to preserve incremental
merging behavior.
- Around line 151-175: emit_visible_text_delta currently uses
visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks
retroactively hide earlier output; change it to compute the longest common
prefix (LCP) length between self.accumulated_text and visible_text, then treat
the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call
ensure_message_started, update self.accumulated_text = visible_text.to_string(),
build the ResponseTextDeltaEvent (using next_seq(), message_item_id,
message_output_index, content_index 0, delta string, logprobs Some(vec![])) and
push the SSE event as before so retroactive visibility changes emit the correct
suffix.

---

Nitpick comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 1087-1099: Add a new test that simulates a multi-chunk scenario
for a partial "</think>" split across chunks to ensure the strip_prefix logic in
ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter
via ResponseStreamConverter::new(...) and call emit_start_events(), then call
process_chunk twice (first with a chunk ending in a partial tag like "private
reasoning</thi" and second with the remainder "nk>Visible answer.") and assert
that the visible delta event ("response.output_text.delta") is emitted and
conv.accumulated_text equals "Visible answer."; this mirrors
test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to
validate handling of split "</think>" across process_chunk calls.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 42ac10e6-1196-4cc2-9f80-95f13ec665d9

📥 Commits

Reviewing files that changed from the base of the PR and between 02a21ac and 19e39c6.

📒 Files selected for processing (2)

lib/llm/src/protocols/openai/responses/mod.rs
lib/llm/src/protocols/openai/responses/stream_converter.rs

🚧 Files skipped from review as they are similar to previous changes (1)

lib/llm/src/protocols/openai/responses/mod.rs

Two bugs addressed: 1. strip_tag arm ordering: when </think> appeared before <think> (dangling close followed by a matched pair), the (Some(start), _) arm fired instead of the dangling-close arm, leaking the </think> tag and truncating everything after the <think> open. Fix: reorder match arms so dangling-close check runs before open-without-close. 2. emit_visible_text_delta prefix assumption: when retroactive stripping (e.g. </think> completing across chunks) invalidated already-emitted text, strip_prefix returned None and accumulated_text was never updated, permanently blocking all future deltas. Fix: fall back to longest common prefix computation so subsequent text is still emitted. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec · 2026-03-29T03:44:53Z

@coderabbitai review

coderabbitai · 2026-03-29T03:45:59Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

ishandhanani added 6 commits March 27, 2026 13:19

fix: improve responses stream conversion for tool calls

48d1905

glm47 for codex

50e423e

Fix GLM parser recovery for Codex responses

000b07b

Improve GLM responses handling for Codex

68eeede

test: clean up responses reasoning coverage

f42eff8

fix: dedupe mixed responses tool call streams

02a21ac

ishandhanani requested a review from a team as a code owner March 27, 2026 13:45

pull-request-size Bot added the size/XXL label Mar 27, 2026

github-actions Bot added the fix label Mar 27, 2026

ishandhanani mentioned this pull request Mar 27, 2026

feat: enable ephemeral kv cache sessions via sglang #7384

Closed

4 tasks

github-actions Bot added the frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` label Mar 27, 2026

style: format responses stream converter

f2b5d1a

copy-pr-bot Bot temporarily deployed to GITLAB March 27, 2026 13:57 Inactive

coderabbitai Bot reviewed Mar 27, 2026

View reviewed changes

PeaBrane approved these changes Mar 27, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB March 29, 2026 03:24 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 29, 2026 03:25 Inactive

coderabbitai Bot reviewed Mar 29, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB March 29, 2026 03:44 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 29, 2026 03:45 Inactive

ishandhanani closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve GLM 4.7 responses handling for Codex#7666

fix: improve GLM 4.7 responses handling for Codex#7666
ishandhanani wants to merge 9 commits intomainfrom
idhanani/dyn-glm47-responses-codex

ishandhanani commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ishandhanani commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Uh oh!

coderabbitai Bot Mar 27, 2026

Uh oh!

MatejKosec commented Mar 29, 2026

Uh oh!

coderabbitai Bot commented Mar 29, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

MatejKosec commented Mar 29, 2026

Uh oh!

coderabbitai Bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ishandhanani commented Mar 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Found

Fixes

Validation

Summary by CodeRabbit

Uh oh!

ishandhanani commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

MatejKosec commented Mar 29, 2026

Uh oh!

coderabbitai Bot commented Mar 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

MatejKosec commented Mar 29, 2026

Uh oh!

coderabbitai Bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ishandhanani commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading