Problem
Users experience thinking collapse — a family of related defects where the model's reasoning (thinking) blocks either (a) freeze with a live spinner that never resolves, (b) get silently truncated to ≤4 lines with no clear expand affordance during streaming, or (c) disappear entirely from the API message history, causing HTTP 400 errors on the next turn.
Four distinct root causes have been identified through code analysis.
Root Cause 1: Thinking block stuck in streaming: true when ThinkingComplete is missed
Location: crates/tui/src/tui/ui.rs — apply_engine_error_to_app
When a network interruption or mid-stream disconnect occurs, the ThinkingComplete event is never delivered to the UI event loop. The HistoryCell::Thinking { streaming: true, duration_secs: None } stays in the active_cell forever. The spinner never stops. The block never gets a duration stamp and stays ThinkingVisualState::Live with status: "live".
The test flush_active_cell_finalizes_unclosed_thinking_block acknowledges this defensively but only covers the flush_active_cell path. The error-recovery path through apply_engine_error_to_app clears streaming_thinking_active_entry = None without finalizing the cell:
// ui.rs — apply_engine_error_to_app
app.streaming_state.reset();
app.streaming_message_index = None;
app.streaming_thinking_active_entry = None; // clears pointer but never finalizes the cell
Impact: After any API error or stream drop during a thinking phase, the thinking cell renders with a permanent live spinner.
Root Cause 2: streaming_state.reset() on ThinkingStarted drops buffered tail content
Location: crates/tui/src/tui/ui.rs — EngineEvent::ThinkingStarted handler
EngineEvent::ThinkingStarted { .. } => {
app.reasoning_buffer.clear();
app.reasoning_header = None;
app.thinking_started_at = Some(Instant::now());
app.streaming_state.reset(); // unconditional reset
app.streaming_state.start_thinking(0, None);
let _ = ensure_streaming_thinking_active_entry(app);
}
streaming_state.reset() is called unconditionally. In a multi-block turn (V4 interleaved thinking), if a second ThinkingStarted arrives while the first block is still streaming, the uncommitted tail text in streaming_state for the first block is silently discarded.
Impact: In multi-round tool-call turns with interleaved thinking, the last few tokens of each thinking block may be silently truncated.
Root Cause 3: last_reasoning race between ThinkingComplete and MessageComplete
Location: crates/tui/src/tui/ui.rs — event dispatch loop
ThinkingComplete sets app.last_reasoning. MessageComplete reads and clears app.last_reasoning to build the ContentBlock::Thinking for api_messages.
If MessageComplete is processed before ThinkingComplete (possible when the engine channel drains events in burst order), last_reasoning is None at MessageComplete time, and the thinking block is not included in the API message history:
// EngineEvent::MessageComplete
let thinking = app.last_reasoning.take(); // could be None if ThinkingComplete hasn't arrived yet
if let Some(thinking) = thinking {
blocks.push(ContentBlock::Thinking { thinking });
}
Impact: Thinking content is stripped from api_messages. On the next turn, DeepSeek V4 returns HTTP 400 (missing reasoning_content), or the model loses continuity of its own reasoning chain.
Root Cause 4 (UX): Hard truncation to 4 rendered lines with no streaming expand affordance
Location: crates/tui/src/tui/history.rs — THINKING_SUMMARY_LINE_LIMIT = 4
Completed thinking blocks are unconditionally collapsed to 4 rendered lines:
const THINKING_SUMMARY_LINE_LIMIT: usize = 4;
if collapsed && rendered.len() > THINKING_SUMMARY_LINE_LIMIT {
rendered.truncate(THINKING_SUMMARY_LINE_LIMIT);
truncated = true;
}
The affordance "thinking collapsed; press Ctrl+O for full text" is only shown when !streaming && (truncated || ...). During live streaming this line is suppressed entirely — users watching a long thinking block receive no indication that content is being hidden.
Impact: Users see partial thinking output during streaming with no signal that more exists or how to see it.
Acceptance Criteria
Reproduction Steps (RC1 — easiest to reproduce)
- Start a session with
deepseek-v4-pro (thinking enabled).
- Submit a complex multi-step prompt that triggers a long reasoning block.
- While the thinking spinner is active (
… thinking · live), press Esc to cancel.
- Observe: the thinking cell retains the live spinner and
status: "live" permanently instead of transitioning to interrupted/partial.
Affected Files
crates/tui/src/tui/ui.rs — RC1, RC2, RC3
crates/tui/src/tui/history.rs — RC4 (THINKING_SUMMARY_LINE_LIMIT)
Models Affected
Any V4 thinking-mode model: deepseek-v4-pro, deepseek-v4-flash with thinking enabled.
Problem
Users experience thinking collapse — a family of related defects where the model's reasoning (thinking) blocks either (a) freeze with a live spinner that never resolves, (b) get silently truncated to ≤4 lines with no clear expand affordance during streaming, or (c) disappear entirely from the API message history, causing HTTP 400 errors on the next turn.
Four distinct root causes have been identified through code analysis.
Root Cause 1: Thinking block stuck in
streaming: truewhenThinkingCompleteis missedLocation:
crates/tui/src/tui/ui.rs—apply_engine_error_to_appWhen a network interruption or mid-stream disconnect occurs, the
ThinkingCompleteevent is never delivered to the UI event loop. TheHistoryCell::Thinking { streaming: true, duration_secs: None }stays in theactive_cellforever. The spinner never stops. The block never gets a duration stamp and staysThinkingVisualState::Livewithstatus: "live".The test
flush_active_cell_finalizes_unclosed_thinking_blockacknowledges this defensively but only covers theflush_active_cellpath. The error-recovery path throughapply_engine_error_to_appclearsstreaming_thinking_active_entry = Nonewithout finalizing the cell:Impact: After any API error or stream drop during a thinking phase, the thinking cell renders with a permanent live spinner.
Root Cause 2:
streaming_state.reset()onThinkingStarteddrops buffered tail contentLocation:
crates/tui/src/tui/ui.rs—EngineEvent::ThinkingStartedhandlerstreaming_state.reset()is called unconditionally. In a multi-block turn (V4 interleaved thinking), if a secondThinkingStartedarrives while the first block is still streaming, the uncommitted tail text instreaming_statefor the first block is silently discarded.Impact: In multi-round tool-call turns with interleaved thinking, the last few tokens of each thinking block may be silently truncated.
Root Cause 3:
last_reasoningrace betweenThinkingCompleteandMessageCompleteLocation:
crates/tui/src/tui/ui.rs— event dispatch loopThinkingCompletesetsapp.last_reasoning.MessageCompletereads and clearsapp.last_reasoningto build theContentBlock::Thinkingforapi_messages.If
MessageCompleteis processed beforeThinkingComplete(possible when the engine channel drains events in burst order),last_reasoningisNoneatMessageCompletetime, and the thinking block is not included in the API message history:Impact: Thinking content is stripped from
api_messages. On the next turn, DeepSeek V4 returns HTTP 400 (missingreasoning_content), or the model loses continuity of its own reasoning chain.Root Cause 4 (UX): Hard truncation to 4 rendered lines with no streaming expand affordance
Location:
crates/tui/src/tui/history.rs—THINKING_SUMMARY_LINE_LIMIT = 4Completed thinking blocks are unconditionally collapsed to 4 rendered lines:
The affordance
"thinking collapsed; press Ctrl+O for full text"is only shown when!streaming && (truncated || ...). During live streaming this line is suppressed entirely — users watching a long thinking block receive no indication that content is being hidden.Impact: Users see partial thinking output during streaming with no signal that more exists or how to see it.
Acceptance Criteria
RC1: When a stream error fires while a thinking block is active (
streaming_thinking_active_entry.is_some()),apply_engine_error_to_appmust callfinalize_streaming_thinking_active_entry(or equivalent) before clearing the pointer, so the block is stamped as interrupted rather than left frozen with a live spinner.RC2: The
ThinkingStartedhandler must flush any uncommitted tail text fromstreaming_stateinto the current active thinking cell before callingstreaming_state.reset(), or skip the reset entirely when a thinking entry is already active.RC3: The event dispatch loop must guarantee that
ThinkingCompleteis always processed beforeMessageCompletefor the same turn. Acceptable approaches: (a) enforce event emission order in the engine, (b) haveMessageCompleteinline-drain a pending thinking finalization whenstreaming_thinking_active_entryis still set, or (c) unify into a singleThinkingAndMessageCompleteevent.RC4 (UX): During streaming of a thinking block that exceeds
THINKING_SUMMARY_LINE_LIMIT, display a live affordance (e.g."… (N more lines)") so the user knows the block is being truncated and can navigate to the full text.Reproduction Steps (RC1 — easiest to reproduce)
deepseek-v4-pro(thinking enabled).… thinking · live), press Esc to cancel.status: "live"permanently instead of transitioning to interrupted/partial.Affected Files
crates/tui/src/tui/ui.rs— RC1, RC2, RC3crates/tui/src/tui/history.rs— RC4 (THINKING_SUMMARY_LINE_LIMIT)Models Affected
Any V4 thinking-mode model:
deepseek-v4-pro,deepseek-v4-flashwith thinking enabled.