Skip to content

fix(adapters): apply_patch diff UI 在 chat-completions provider 上工作#236

Open
Cmochance wants to merge 3 commits into
mainfrom
worktree-fix+apply-patch-custom-tool-call-wire
Open

fix(adapters): apply_patch diff UI 在 chat-completions provider 上工作#236
Cmochance wants to merge 3 commits into
mainfrom
worktree-fix+apply-patch-custom-tool-call-wire

Conversation

@Cmochance
Copy link
Copy Markdown
Owner

@Cmochance Cmochance commented May 20, 2026

Summary

修复 Codex Desktop 接 DeepSeek / Kimi / MiMo 等 chat-completions provider 时 apply_patch 工具稳定 aborted(用户感知:文件编辑 diff UI 完全坏掉)。

根因是 wire 形态错配:Codex CLI 把 apply_patch 注册为 freeform 工具(只有一个 Freeform enum 变体),其 router 按 wire item type 路由,handler 硬要求 ToolPayload::Custom { input };但本仓 adapter 在响应侧把 chat tool_calls[] 一律渲染成 function_call wire → Codex CLI 立即 abort。

修复在 adapter 做双向桥接(请求 + 响应 + 多轮回放),保持上游 Codex CLI 期望的 custom_tool_call wire 形态,同时让 DeepSeek 看到 chat-friendly 的 V4A 格式说明。详细根因和方案见 #235 + commit body。

Test plan

  • 新增 8 个回归测试:custom_tool_call wire emit / JSON or bare V4A 兜底 / interrupted incomplete / streaming-envelope 一致性 / 多轮回放(2 类 item)/ V4A 描述注入
  • cargo test --workspace 506 adapter unit 测试 + 25 集成测试全通过(并发 flake gemini_oauth::cancel_slot_epoch_* 跟本 PR 无关,serial 跑全过)
  • cargo fmt --all -- --check clean
  • 真机验证:本地 build .app + DeepSeek 配置跑一次文件编辑任务,确认 +/- diff 卡片出现且 patch 成功应用(下一步,push 后立即跑)
  • 不影响 Codex / GPT 官方登录路径(那条走原生 Responses,不经 chat adapter 转换)

引用


Open in Devin Review

)

## 现象

用户用 App Transfer + DeepSeek (或 Kimi / MiMo) 接 Codex Desktop 时,所有 API
返回 200,但 apply_patch 工具调用稳定 aborted,Codex Desktop 前端不出 +/- diff
卡片,文件编辑功能彻底坏掉。shell_command / fetch 正常。

## 根因 (对照 openai/codex @ 000bf5c 上游源码验证)

Codex CLI 把 apply_patch 作为 freeform 工具注册:

- `codex-rs/protocol/src/openai_models.rs:202-206` — `ApplyPatchToolType` enum
  当前**只有 `Freeform`** 一个变体(社区提议 #14046 加 Function 变体未合并)
- `codex-rs/core/src/tools/handlers/apply_patch_spec.rs` — wire 形态是
  `ToolSpec::Freeform { format: { type:"grammar", syntax:"lark" } }`
- `codex-rs/core/src/tools/router.rs:90-134` — 响应侧按 wire item type 路由:
  `ResponseItem::FunctionCall` → `ToolPayload::Function { arguments }`,
  `ResponseItem::CustomToolCall` → `ToolPayload::Custom { input }`
- `codex-rs/core/src/tools/handlers/apply_patch.rs:324` — apply_patch handler
  硬要求 `ToolPayload::Custom`,收 Function 直接返回
  `"apply_patch handler received unsupported payload"` → abort

本仓 adapter 在响应侧把 DeepSeek 等 chat 上游的 `tool_calls[]` 一律渲染成
`function_call` wire,Codex CLI router 立刻 mismatch → abort。同时请求侧把
custom tool 降级成 function 时,upstream "do not wrap the patch in JSON" 的
description 在 chat 路径上反而误导模型;且没有 V4A 格式样例。

## 修复 (方案 B - adapter 双向桥接)

### 请求侧 `responses/request/tools.rs`

对 `name == "apply_patch"` 特判,把 custom → function 降级时:
- 替换 outer description 为 chat 路径准确的 V4A 指引(`*** Begin Patch`,文件
  操作头,hunk 标记,relative path,JSON 字符串里写 `\n` 转义换行)
- input 参数 description 镜像 V4A 关键约束

### 响应侧 `responses/converter.rs`

对 `name == "apply_patch"` 特判,emit Responses `custom_tool_call` wire 而非
`function_call`:
- `output_item.added` 用 `type:"custom_tool_call"`(empty `input`)
- 中间 args delta **不** emit(避免对 JSON 累积字符串做流式 input 提取)
- close 时一次性 emit `response.custom_tool_call_input.delta` +
  `.done` + `output_item.done`(`type:"custom_tool_call"`)
- 提取 input:`{"input":"<V4A>"}` JSON 解出;非 JSON 或缺 input 字段时整段
  原样透传(让 Codex CLI parse_patch 给出可读错误而非静默 abort)
- envelope `output[]` 终态用同一 input 字符串(cached 到 PendingToolCall,
  防 close 与 envelope build 之间 drift)
- interrupted (无 finish_reason 且非 [DONE] 收尾) 时 emit
  `status:"incomplete"` 并 **skip** `input.done`,防止严格客户端在 stream
  半截断时执行 partial patch (destructive tool 安全防线)
- `call_id` 在 `output_item.added` emit 后 freeze,不再被后续 chunk 覆盖
  (避免同一 item 暴露两个不同 call_id)
- 加 tracing telemetry:positive shim 触发 (info)、晚到 name (warn)、
  空 args (warn)、JSON parse 失败分流 (debug 裸 V4A / warn 真坏)

### 请求侧多轮回放 `responses/request.rs` (BLOCKER)

turn N+1 时 Codex CLI 把上一轮 `ResponseItem::CustomToolCall` /
`CustomToolCallOutput` 通过 `input[]` 回放给我们。原 `input_item_to_messages`
只处理 `function_call` / `function_call_output`,这两类静默落入 `_ =>` 兜底被
丢弃 → 多轮上下文丢失。本提交补两个分支:

- `custom_tool_call` → `role:assistant` + `tool_calls[]` (function-call 形态,
  arguments 包成 `{"input":"<V4A>"}` JSON 字符串,与首轮 lowering 形态一致)
- `custom_tool_call_output` → `role:tool` + `tool_call_id` + content

## 测试

新增 8 个回归测试 (响应侧 6 + 请求侧 2):
- chat tool_calls(apply_patch) → custom_tool_call wire
- JSON args / 裸 V4A 兜底 / 缺 input 字段
- interrupted stream → status=incomplete + skip input.done
- streaming output_item.done.input == envelope.output[].input
- custom_tool_call input → assistant.tool_calls (多轮回放)
- custom_tool_call_output → role:tool (多轮回放)
- request 侧 V4A 描述注入 (apply_patch vs 普通 custom 工具)

`cargo test --workspace`: 全套通过 (506 adapter unit + 12+10+3 集成,跟原仓
一致;唯一偶发并发 flake `gemini_oauth::cancel_slot_epoch_*` 与本提交无关,
serial 跑全过)。

## 注意

不影响 Codex / GPT 官方登录路径 (那条走原生 Responses API,不经 chat
adapter 转换)。本修复 strictly 针对 chat completions provider 转 Responses
的方向。

Refs #235
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1054 to +1061
return json!({
"type": "custom_tool_call",
"id": pending.fc_id,
"call_id": pending.call_id,
"name": pending.name,
"input": input,
"status": "completed",
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 tool_call_item_completed always emits status: "completed" for interrupted apply_patch, contradicting streaming event's status: "incomplete"

When an apply_patch tool call is interrupted (stream EOF without finish_reason or [DONE]), close_tool_call correctly emits response.output_item.done with "status": "incomplete" (converter.rs:745). However, tool_call_item_completed (converter.rs:1060) always hardcodes "status": "completed" for apply_patch items regardless of interruption state. This function is called by emit_close to build the response.completed envelope's output[] array.

The code comment at lines 1042-1043 explicitly states the invariant: "envelope.output[] 终态必须和流式 response.output_item.done 的 item 一致", yet the implementation violates this for interrupted calls. Since apply_patch is described as "destructive" and the entire interrupted-handling logic exists to prevent partial patch execution, a strict client that reads the envelope's item status (rather than the earlier streaming event) could incorrectly treat a truncated patch as complete and attempt execution.

The existing test apply_patch_interrupted_stream_emits_incomplete_status_skips_input_done checks the response-level status but does not assert on completed.1["response"]["output"][0]["status"], so this inconsistency is not caught.

Prompt for agents
The function `tool_call_item_completed` at converter.rs:1040 needs to know whether the tool call was interrupted so it can return status "incomplete" instead of always "completed". The simplest fix is to add a boolean field (e.g., `interrupted: bool`) to the `PendingToolCall` struct that is set to true in `close_tool_call` when `interrupted && is_apply_patch` (right before `pending.closed = true` at line 760). Then in `tool_call_item_completed`, when `pending.is_apply_patch` is true, use `if pending.interrupted { "incomplete" } else { "completed" }` for the status field. The test `apply_patch_interrupted_stream_emits_incomplete_status_skips_input_done` should also be extended to assert `completed.1["response"]["output"][0]["status"] == "incomplete"`.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

真机验证(用户 prompt 让模型在 README 5-10 行间插一段 markdown)发现:wire
桥接成功(25+ shim 触发 zero abort),但模型连续 20 分钟、25+ retry 在 V4A
hunk header 上栽跟头,最终 fallback 到 sed 才完成。

根因:`@@ <context> @@` 后的 space-prefixed 行的语义,在 freeform/lark
grammar 受约束的解码空间下不会错(模型只能产出语法合法序列);切到 chat
function-call 路径后 lark 强约束消失,description 只说了 ` `/`+`/`-`
prefix,**没说 space 行对应 anchor *之后* 的行**。DeepSeek 反复把 anchor
当 space 行重复一次,parse_patch 找不到这样的双行存在 → 拒。

修复:在 `APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT` 加显式 "CRITICAL HUNK
SEMANTICS" 段 + 一个最小可执行 V4A 示例(rename a let binding),展示
anchor 只出现在 `@@ ... @@` 里、不重复到 space 行。`APPLY_PATCH_INPUT_
DESCRIPTION_FOR_CHAT`(参数级)也加紧凑版同规则,防 provider 长 history
时截断 tool-level description。

测试:`tools_custom_apply_patch_injects_v4a_format_hint` 增加 4 个断言,
锁住 anchor 语义解释 + 最小示例 + 参数级紧凑版,防 description 在未来
refactor 时被误删。`cargo test --workspace` 全套通过。

注意:Codex CLI 端 `parse_patch` 失败不会经过 proxy log —— 那个错误在
client tool runtime 路径里被 emit 给模型作为 tool error,所以 PR 之前的
monitor 看不到。本次 follow-up 完全靠用户真机手工实测反馈(感谢)。

Refs #235
devin-ai-integration[bot]

This comment was marked as resolved.

…ription (#235)

DeepSeek 稳定性测试(10 个 Level 全跑通,详见 #235 PR 评论)模型自己摸索出 4 个 chat-path
上 apply_patch 的非平凡行为,每次任务平均花 1-3 分钟绕弯子:

1. `@@ <非空文本> @@` 锚点在 chat 路径上常匹配失败 → 模型用 `printf '\n' >> file`
   种空行作锚点 → patch → 事后清理多余空行
2. `*** Add File: foo` + 同 patch 内 `*** Update File: foo` 冲突
   (新建文件未落盘 Update 已读取)→ 模型改用预建锚点文件
3. 纯空目标文件无法直接 `*** Update File:` → 必须 shell 先 seed 一行
4. 多行文件里纯 `+` 行在锚点后是"追加"不是"替换" → 需 `-` + `+` 配对替换

这些都是 Codex CLI 端 parse_patch 的实际行为,adapter 修不了 wire 层。但可以
预先在请求侧告诉模型这些 workaround,让首次成功率提升、token 浪费降低。

实现:

- `crates/adapters/src/responses/request.rs`:加 `tools_register_apply_patch()`
  检测 + `APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE` 文案 + `apply_patch_chat_guidance_message()`
  构造器;`build_messages_from_input` 紧跟 Codex CLI instructions system message
  之后追加注入,**仅当**当前 turn 的 tools 数组真正注册了 apply_patch
  (type:custom + name:apply_patch)。非 apply_patch turn 0 浪费。
- `crates/adapters/src/responses/request/tools.rs`:在 `APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT`
  末尾补 4 条紧凑版 workaround;`APPLY_PATCH_INPUT_DESCRIPTION_FOR_CHAT`(参数级)
  同步加更紧凑的 backup。三层 redundancy(system / tool desc / param desc)防止
  上游 provider 截断或弱化某一层时模型完全失指引。

设计取舍:

- 注入独立 system message(不合并到 Codex 原 instructions),保持职责分离 +
  方便日后调整 / 替换
- 文本英文,匹配现有 description 风格,跟下游模型 vocab 也更对齐
- 检测条件用 `type:"custom" && name:"apply_patch"` 而不是 lowered 后的
  `type:"function"`,因为我们在 `build_messages_from_input` 时拿到的是原始
  Responses body(`convert_responses_tool_to_chat_tool` 在 `tools` 字段转换路径
  里调用,跟 `messages` 字段构造路径平行)

测试:3 个新单测覆盖:
- 注册 apply_patch 时注入(Codex instructions 不被覆盖、4 条 workaround 都在、
  marker 存在)
- 未注册 apply_patch 时不注入(无 system 数量增加、无 guidance marker)
- 反复 convert 同一 body 3 次,每次 guidance 计数仍为 1(防 merge_consecutive_
  system_messages 之类后处理累积)

`cargo test --workspace`:509 adapter unit + 25 集成测试全过 (506→509)。
`cargo fmt --all -- --check` clean。

Refs #235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant