fix: improve GLM 4.7 responses handling for Codex#7666
fix: improve GLM 4.7 responses handling for Codex#7666ishandhanani wants to merge 9 commits intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/responses/mod.rs (1)
758-768:⚠️ Potential issue | 🟠 MajorSkip the raw tool-call fallback when structured
tool_callsalready exist.This path emits structured function calls first and then unconditionally re-parses
contentfor<tool_call>...</tool_call>. On mixed outputs,/v1/responseswill contain duplicatedFunctionCallitems even though the streaming converter now suppresses that case.Minimal fix
- if let Some(tool_calls) = choice.message.tool_calls { + let has_structured_tool_calls = choice + .message + .tool_calls + .as_ref() + .is_some_and(|tool_calls| !tool_calls.is_empty()); + if let Some(tool_calls) = choice.message.tool_calls { for tc in &tool_calls { output.push(OutputItem::FunctionCall(FunctionToolCall { arguments: tc.function.arguments.clone(), call_id: tc.id.clone(), name: tc.function.name.clone(), id: Some(format!("fc_{}", Uuid::new_v4().simple())), status: Some(OutputStatus::Completed), })); } } @@ - let parsed_calls = parse_tool_call_text(&content_text); + let parsed_calls = if has_structured_tool_calls { + Vec::new() + } else { + parse_tool_call_text(&content_text) + };Also applies to: 814-833
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/responses/mod.rs` around lines 758 - 768, The code currently emits structured FunctionCall items from choice.message.tool_calls and then always falls back to re-parsing the raw content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output; update the logic in the block that pushes OutputItem::FunctionCall (where choice.message.tool_calls is handled and FunctionToolCall is constructed) to skip the raw-content parsing path when choice.message.tool_calls is Some and non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is None or empty), and apply the same conditional fix to the similar parsing block around the other occurrence (the block covering the 814-833 logic) so duplicates are prevented.
🧹 Nitpick comments (1)
lib/llm/src/preprocessor/prompt/template/formatters.rs (1)
59-61: Naive string replacement may affect string literals.The simple
.replace()approach will also rewrite.items()inside string literals (e.g.,{{ "Call .items() on dict" }}). This is unlikely in real templates, but for robustness you could consider a regex that excludes quoted contexts.Given that this pattern mirrors the existing
remove_known_non_jinja2_tagsapproach and false positives are rare in practice, this is acceptable as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/preprocessor/prompt/template/formatters.rs` around lines 59 - 61, The current rewrite_python_dict_items_calls function uses a naive template.replace(".items()", "|items") which will incorrectly modify occurrences inside string literals; update rewrite_python_dict_items_calls to perform a regex-based replacement that only matches .items() outside of single- or double-quoted strings (or reuse the same approach used by remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on dict" }} are left untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 237-286: finish_message_if_started currently emits "text done" /
"part done" / "item done" events and then clears the message state, which causes
finished text-only items to be lost when emit_end_events later builds
Response.output; change the logic so finish_message_if_started still emits the
SSE events but also pushes the finished OutputItem (the same item created in the
item_done block) into a persistent completed-items buffer (e.g., a Vec field
like completed_items or completed_output_items) instead of discarding it, and do
not drop that buffer when clearing transient fields (message_started,
message_item_id, message_output_index, accumulated_text); then update
emit_end_events to construct Response.output from the completed-items buffer
plus any currently open message state so response.completed.response.output
contains all finished items and any in-progress content.
- Around line 151-175: emit_visible_text_delta currently assumes
self.accumulated_text is a strict prefix of visible_text; replace the
strip_prefix logic with computing the longest common prefix (LCP) between
self.accumulated_text and visible_text, return early if they are identical, then
set delta to visible_text[LCP_len..], update self.accumulated_text =
visible_text.to_string(), and emit that delta as before (preserving next_seq(),
message_item_id, etc.). Apply the same LCP-based fix to the analogous projection
logic noted around the 418-440 region so retroactive visibility changes (e.g.
previously hidden markup becoming visible) correctly produce the remaining
visible suffix instead of leaking the old prefix.
- Around line 101-110: The current equality check in append_raw_text drops
legitimate repeated chunks; remove the `content == self.raw_text` shortcut and
instead treat empty strings as the only no-op, then decide between a cumulative
snapshot vs a delta by comparing lengths: if content.starts_with(&self.raw_text)
and content.len() > self.raw_text.len() then replace self.raw_text with content
(treat as cumulative), otherwise push_str(content) (treat as appended/delta) so
identical successive chunks are preserved; update the function append_raw_text
and its use of the raw_text field accordingly.
In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 100-106: The current branch that handles Some((recovered_text, mut
recovered_calls)) from recover_nested_tool_calls(block, config, tools)? discards
the prefix before the nested_start, losing malformed prefix text; fix by
prepending the original prefix slice (&block[..nested_start]) to recovered_text
before pushing into normal_parts and then append recovered_calls to calls (do
the same fix for the other identical block around the second occurrence handling
lines 163-167). Specifically modify the code handling the tuple returned by
recover_nested_tool_calls in glm47_parser.rs so that
normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or
equivalent string/Vec concatenation is used instead of just pushing
recovered_text, then calls.append(&mut recovered_calls).
---
Outside diff comments:
In `@lib/llm/src/protocols/openai/responses/mod.rs`:
- Around line 758-768: The code currently emits structured FunctionCall items
from choice.message.tool_calls and then always falls back to re-parsing the raw
content for "<tool_call>...</tool_call>", causing duplicate FunctionCall output;
update the logic in the block that pushes OutputItem::FunctionCall (where
choice.message.tool_calls is handled and FunctionToolCall is constructed) to
skip the raw-content parsing path when choice.message.tool_calls is Some and
non-empty (i.e., only run the "<tool_call>" content parser when tool_calls is
None or empty), and apply the same conditional fix to the similar parsing block
around the other occurrence (the block covering the 814-833 logic) so duplicates
are prevented.
---
Nitpick comments:
In `@lib/llm/src/preprocessor/prompt/template/formatters.rs`:
- Around line 59-61: The current rewrite_python_dict_items_calls function uses a
naive template.replace(".items()", "|items") which will incorrectly modify
occurrences inside string literals; update rewrite_python_dict_items_calls to
perform a regex-based replacement that only matches .items() outside of single-
or double-quoted strings (or reuse the same approach used by
remove_known_non_jinja2_tags) so template literals like {{ "Call .items() on
dict" }} are left untouched.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9ff97f06-2cdc-4092-82a6-80ced20b7679
📒 Files selected for processing (4)
lib/llm/src/preprocessor/prompt/template/formatters.rslib/llm/src/protocols/openai/responses/mod.rslib/llm/src/protocols/openai/responses/stream_converter.rslib/parsers/src/tool_calling/xml/glm47_parser.rs
| fn finish_message_if_started(&mut self, events: &mut Vec<Result<Event, anyhow::Error>>) { | ||
| if !self.message_started { | ||
| return; | ||
| } | ||
|
|
||
| let text_done = ResponseStreamEvent::ResponseOutputTextDone(ResponseTextDoneEvent { | ||
| sequence_number: self.next_seq(), | ||
| item_id: self.message_item_id.clone(), | ||
| output_index: self.message_output_index, | ||
| content_index: 0, | ||
| text: self.accumulated_text.clone(), | ||
| logprobs: Some(vec![]), | ||
| }); | ||
| events.push(make_sse_event(&text_done)); | ||
|
|
||
| let part_done = | ||
| ResponseStreamEvent::ResponseContentPartDone(ResponseContentPartDoneEvent { | ||
| sequence_number: self.next_seq(), | ||
| item_id: self.message_item_id.clone(), | ||
| output_index: self.message_output_index, | ||
| content_index: 0, | ||
| part: OutputContent::OutputText(OutputTextContent { | ||
| text: self.accumulated_text.clone(), | ||
| annotations: vec![], | ||
| logprobs: Some(vec![]), | ||
| }), | ||
| }); | ||
| events.push(make_sse_event(&part_done)); | ||
|
|
||
| let item_done = ResponseStreamEvent::ResponseOutputItemDone(ResponseOutputItemDoneEvent { | ||
| sequence_number: self.next_seq(), | ||
| output_index: self.message_output_index, | ||
| item: OutputItem::Message(OutputMessage { | ||
| id: Some(self.message_item_id.clone()), | ||
| content: vec![OutputMessageContent::OutputText(OutputTextContent { | ||
| text: self.accumulated_text.clone(), | ||
| annotations: vec![], | ||
| logprobs: Some(vec![]), | ||
| })], | ||
| role: AssistantRole::Assistant, | ||
| status: Some(OutputStatus::Completed), | ||
| }), | ||
| }); | ||
| events.push(make_sse_event(&item_done)); | ||
|
|
||
| self.message_started = false; | ||
| self.message_item_id = format!("msg_{}", Uuid::new_v4().simple()); | ||
| self.message_output_index = 0; | ||
| self.accumulated_text.clear(); | ||
| } |
There was a problem hiding this comment.
Completed text items are dropped from the terminal response.completed.
finish_message_if_started() emits the text/item done events and then clears the message state. emit_end_events() calls it before building Response.output, so text-only streams—and any text finished before a tool call—disappear from response.completed.response.output. Keep finished message items in a completed-items buffer and build the final response from that buffer plus any still-open state.
Also applies to: 589-649
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 237
- 286, finish_message_if_started currently emits "text done" / "part done" /
"item done" events and then clears the message state, which causes finished
text-only items to be lost when emit_end_events later builds Response.output;
change the logic so finish_message_if_started still emits the SSE events but
also pushes the finished OutputItem (the same item created in the item_done
block) into a persistent completed-items buffer (e.g., a Vec field like
completed_items or completed_output_items) instead of discarding it, and do not
drop that buffer when clearing transient fields (message_started,
message_item_id, message_output_index, accumulated_text); then update
emit_end_events to construct Response.output from the completed-items buffer
plus any currently open message state so response.completed.response.output
contains all finished items and any in-progress content.
| if let Some((recovered_text, mut recovered_calls)) = | ||
| recover_nested_tool_calls(block, config, tools)? | ||
| { | ||
| if !recovered_text.is_empty() { | ||
| normal_parts.push(recovered_text); | ||
| } | ||
| calls.append(&mut recovered_calls); |
There was a problem hiding this comment.
Nested recovery still drops the malformed prefix.
When recover_nested_tool_calls() succeeds, only &block[nested_start..] is re-parsed. Everything before that nested start is discarded instead of being preserved as normal text, so malformed wrappers can still lose visible model output even though this path is supposed to avoid silent drops. Prepend &block[..nested_start] to the recovered normal text before returning.
Also applies to: 163-167
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs` around lines 100 - 106, The
current branch that handles Some((recovered_text, mut recovered_calls)) from
recover_nested_tool_calls(block, config, tools)? discards the prefix before the
nested_start, losing malformed prefix text; fix by prepending the original
prefix slice (&block[..nested_start]) to recovered_text before pushing into
normal_parts and then append recovered_calls to calls (do the same fix for the
other identical block around the second occurrence handling lines 163-167).
Specifically modify the code handling the tuple returned by
recover_nested_tool_calls in glm47_parser.rs so that
normal_parts.push(format!("{}{}", &block[..nested_start], recovered_text)) or
equivalent string/Vec concatenation is used instead of just pushing
recovered_text, then calls.append(&mut recovered_calls).
…ing tool calls Two bugs addressed: 1. Streaming: finish_message_if_started() clears message_started and accumulated_text, so the subsequent guard in emit_end_events() that builds Response.output always evaluated false. Text-only streams got an empty output array in the response.completed event. Fix: capture message state before calling finish. 2. Non-streaming: chat_completion_to_response() emitted structured tool_calls first, then unconditionally re-parsed content for raw <tool_call> blocks, producing duplicates when both were present. Fix: skip raw parsing when structured tool_calls already exist. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (2)
lib/llm/src/protocols/openai/responses/stream_converter.rs (2)
101-111:⚠️ Potential issue | 🟠 Major
append_raw_text()still drops legitimate repeated chunks.The
content == self.raw_textcheck on line 102 will incorrectly drop consecutive identical delta chunks (e.g., two" "chunks). After the first" "is appended,self.raw_text == " ", so the second" "chunk triggers the early return.Consider removing the equality check and only return early for empty content:
Proposed fix
fn append_raw_text(&mut self, content: &str) { - if content.is_empty() || content == self.raw_text { + if content.is_empty() { return; } if content.starts_with(&self.raw_text) { self.raw_text = content.to_string(); } else { self.raw_text.push_str(content); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 101 - 111, The append_raw_text method drops legitimate repeated identical chunks because of the equality check; change append_raw_text(&mut self, content: &str) to only early-return when content.is_empty(), removing the content == self.raw_text check so consecutive identical delta chunks (e.g., repeated " ") are appended; keep the existing starts_with branch (self.raw_text = content.to_string()) and the push_str fallback as-is to preserve incremental merging behavior.
151-175:⚠️ Potential issue | 🟠 MajorVisible text projection can regress across chunk boundaries.
The
strip_prefixapproach assumesvisible_textalways starts withaccumulated_text, but later bytes can retroactively hide earlier output. For example:
- Chunk 1:
"<tool"→strip_tool_call_textreturns"<tool"(incomplete tag, visible) →accumulated_text = "<tool"- Chunk 2:
"<tool_call>{...}</tool_call>Done"→strip_tool_call_textreturns"Done""Done".strip_prefix("<tool")fails → delta not emittedConsider computing the longest common prefix (LCP) between
accumulated_textandvisible_text, then emittingvisible_text[lcp_len..]as the delta. This handles retroactive visibility changes gracefully.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 151 - 175, emit_visible_text_delta currently uses visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks retroactively hide earlier output; change it to compute the longest common prefix (LCP) length between self.accumulated_text and visible_text, then treat the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call ensure_message_started, update self.accumulated_text = visible_text.to_string(), build the ResponseTextDeltaEvent (using next_seq(), message_item_id, message_output_index, content_index 0, delta string, logprobs Some(vec![])) and push the SSE event as before so retroactive visibility changes emit the correct suffix.
🧹 Nitpick comments (1)
lib/llm/src/protocols/openai/responses/stream_converter.rs (1)
1087-1099: Consider adding a multi-chunk test for partial</think>markup.This test covers the single-chunk case. Given the strip_prefix limitation noted earlier, a multi-chunk test where
</think>arrives split across chunks (e.g.,"</thi"then"nk>Visible") would help verify edge cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/responses/stream_converter.rs` around lines 1087 - 1099, Add a new test that simulates a multi-chunk scenario for a partial "</think>" split across chunks to ensure the strip_prefix logic in ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter via ResponseStreamConverter::new(...) and call emit_start_events(), then call process_chunk twice (first with a chunk ending in a partial tag like "private reasoning</thi" and second with the remainder "nk>Visible answer.") and assert that the visible delta event ("response.output_text.delta") is emitted and conv.accumulated_text equals "Visible answer."; this mirrors test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to validate handling of split "</think>" across process_chunk calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 101-111: The append_raw_text method drops legitimate repeated
identical chunks because of the equality check; change append_raw_text(&mut
self, content: &str) to only early-return when content.is_empty(), removing the
content == self.raw_text check so consecutive identical delta chunks (e.g.,
repeated " ") are appended; keep the existing starts_with branch (self.raw_text
= content.to_string()) and the push_str fallback as-is to preserve incremental
merging behavior.
- Around line 151-175: emit_visible_text_delta currently uses
visible_text.strip_prefix(&self.accumulated_text) which fails when later chunks
retroactively hide earlier output; change it to compute the longest common
prefix (LCP) length between self.accumulated_text and visible_text, then treat
the delta as visible_text[lcp_len..]; if delta is empty return, otherwise call
ensure_message_started, update self.accumulated_text = visible_text.to_string(),
build the ResponseTextDeltaEvent (using next_seq(), message_item_id,
message_output_index, content_index 0, delta string, logprobs Some(vec![])) and
push the SSE event as before so retroactive visibility changes emit the correct
suffix.
---
Nitpick comments:
In `@lib/llm/src/protocols/openai/responses/stream_converter.rs`:
- Around line 1087-1099: Add a new test that simulates a multi-chunk scenario
for a partial "</think>" split across chunks to ensure the strip_prefix logic in
ResponseStreamConverter handles boundary cases: create a ResponseStreamConverter
via ResponseStreamConverter::new(...) and call emit_start_events(), then call
process_chunk twice (first with a chunk ending in a partial tag like "private
reasoning</thi" and second with the remainder "nk>Visible answer.") and assert
that the visible delta event ("response.output_text.delta") is emitted and
conv.accumulated_text equals "Visible answer."; this mirrors
test_dangling_think_close_does_not_leak_hidden_prefix but uses two chunks to
validate handling of split "</think>" across process_chunk calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 42ac10e6-1196-4cc2-9f80-95f13ec665d9
📒 Files selected for processing (2)
lib/llm/src/protocols/openai/responses/mod.rslib/llm/src/protocols/openai/responses/stream_converter.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/llm/src/protocols/openai/responses/mod.rs
Two bugs addressed: 1. strip_tag arm ordering: when </think> appeared before <think> (dangling close followed by a matched pair), the (Some(start), _) arm fired instead of the dangling-close arm, leaking the </think> tag and truncating everything after the <think> open. Fix: reorder match arms so dangling-close check runs before open-without-close. 2. emit_visible_text_delta prefix assumption: when retroactive stripping (e.g. </think> completing across chunks) invalidated already-emitted text, strip_prefix returned None and accumulated_text was never updated, permanently blocking all future deltas. Fix: fall back to longest common prefix computation so subsequent text is still emitted. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
/v1/responsesIssues Found
<tool_call>blocks could confuse parsing and cause Codex tool calls to be dropped or recovered inconsistently<tool_call>text and structuredtool_callsdeltasitemskeyFixes
.items()patterns become valid Jinja|itemsValidation
Summary by CodeRabbit
New Features
Bug Fixes