Skip to content

fix(adapters): apply_patch chat-path prompt 质量提升 (refs #235, stacked on #236)#238

Closed
Cmochance wants to merge 4 commits into
mainfrom
worktree-feat+apply-patch-chat-path-prompt-quality
Closed

fix(adapters): apply_patch chat-path prompt 质量提升 (refs #235, stacked on #236)#238
Cmochance wants to merge 4 commits into
mainfrom
worktree-feat+apply-patch-chat-path-prompt-quality

Conversation

@Cmochance
Copy link
Copy Markdown
Owner

@Cmochance Cmochance commented May 21, 2026

Scope

Stacked on top of #236(wire 层 fix)。本 PR 处理同 issue #235 的另一半:chat-path 上模型生成的 V4A patch 内容质量

PR #236 之后的真机 capture(28-turn / 26MB / DeepSeek + Kimi)显示 7 个 apply_patch tool_call 中只有 1 个成功 apply,失败的 6 个分三类:

失败类型 出现 turn Codex Desktop 报错
模型直接吐 Python 代码不用 V4A 3 turn(0016/0019/0022) invalid hunk at line 3, 'def main():' is not a valid hunk header
V4A 格式正确但 - 行不 byte-exact 1 turn(0010) Failed to find expected lines in <file>
模型选 exec_command 而非 apply_patch 3 turn(0000/0002/0026) (UI 没 diff card,模型用 sed/echo 改文件)

改动

针对模型生成质量,wire 层不再变化。

  • 新增 crates/adapters/src/responses/apply_patch_v4a_reference.md:verbatim 镜像上游 Codex CLI codex-rs/core/prompt_with_apply_patch_instructions.md L277-L351 @ commit 0b4f86095c8005d8f74e9c62b971d72c1670aa88(Apache-2.0, Copyright 2025 OpenAI)。头部加 adapter note 显式 override "shell command" 字眼为 "function-call tool",其余原文未改动。
  • APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE(request.rs):重写为三段 — (1) Tool selection 顶层引导(对抗 exec_command 偏好) (2) include_str! 嵌入上述 V4A 教学(envelope / @@ 锚点 / 3-line context / EBNF / 多操作 example) (3) 5 条 chat-path 实测 gotcha(byte-exact / Empty-line anchor 仅当 blank 存在 / Add+Update 同 patch / 空文件 / 纯 + 行不替换)。
  • APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT(tools.rs):扩展加入 Tool selection 顶层引导、BYTE-EXACT 匹配规则、3 个 positive example、anti-pattern reminder("NEVER pass raw source code")。
  • APPLY_PATCH_INPUT_DESCRIPTION_FOR_CHAT(参数级 mirror):部分 provider 在长历史中降权 tool-level description,参数级补 byte-exact + anti-pattern 紧凑版保持可见性。
  • License 合规:新增 NOTICE 文件;ACKNOWLEDGEMENTS.md / README.md / README.en.md 致谢段同步加上游 attribution(full 40-char SHA + L277-L351 + Apache-2.0)。

调研依据

push 前起 general-purpose subagent 查上游(file:line 引用,禁推测词)→ 关键发现:

  • Codex CLI 上游没有 chat-completions function-call 的 apply_patch 路径(codex-rs/protocol/src/openai_models.rs:206-210ApplyPatchToolType 只有 Freeform)。本 PR 是真"无前例"路径,只能借鉴上游 shell-path 的 prompt 教学。
  • Anthropic 官方明示 tool description 是 "by far the most important factor",推荐 detailed description + input_examples(positive examples)。本 PR 加 3 个 positive example。
  • litellm 在 Anthropic→OpenAI 是纯 passthrough(transformation.py:685-687)→ 验证 adapter 层做 prompt 增强方向是对的,litellm 自己也不当协议层问题解决。

Verification

  • cargo test -p codex-app-transfer-adapters --lib: 509 passed, 0 failed
  • pre-push 3-agent review(code-reviewer / comment-analyzer / docs-sync-after-change)全跑过,1 BLOCKER(NOTICE 文件不存在)+ 2 IMPORTANT 已修

Regression baseline + round2

issue #235 真机 capture(round 1)= 26MB 数据保留作为 baseline。本 PR push 后会进行 round 2 真机测试(同 prompt 重跑),对比 success_signals / failure_signals,届时在本 PR 评论补具体数字。

License

Apache-2.0 → MIT verbatim borrow,attribution 走:NOTICE / ACKNOWLEDGEMENTS.md / README{,.en}.md 致谢段 / reference.md 文件头部 adapter note 共 5 处(SHA 同步策略见 request.rs::APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE doc comment)。

Refs #235


Open in Devin Review

Cmochance added 4 commits May 20, 2026 19:56
)

## 现象

用户用 App Transfer + DeepSeek (或 Kimi / MiMo) 接 Codex Desktop 时,所有 API
返回 200,但 apply_patch 工具调用稳定 aborted,Codex Desktop 前端不出 +/- diff
卡片,文件编辑功能彻底坏掉。shell_command / fetch 正常。

## 根因 (对照 openai/codex @ 000bf5c 上游源码验证)

Codex CLI 把 apply_patch 作为 freeform 工具注册:

- `codex-rs/protocol/src/openai_models.rs:202-206` — `ApplyPatchToolType` enum
  当前**只有 `Freeform`** 一个变体(社区提议 #14046 加 Function 变体未合并)
- `codex-rs/core/src/tools/handlers/apply_patch_spec.rs` — wire 形态是
  `ToolSpec::Freeform { format: { type:"grammar", syntax:"lark" } }`
- `codex-rs/core/src/tools/router.rs:90-134` — 响应侧按 wire item type 路由:
  `ResponseItem::FunctionCall` → `ToolPayload::Function { arguments }`,
  `ResponseItem::CustomToolCall` → `ToolPayload::Custom { input }`
- `codex-rs/core/src/tools/handlers/apply_patch.rs:324` — apply_patch handler
  硬要求 `ToolPayload::Custom`,收 Function 直接返回
  `"apply_patch handler received unsupported payload"` → abort

本仓 adapter 在响应侧把 DeepSeek 等 chat 上游的 `tool_calls[]` 一律渲染成
`function_call` wire,Codex CLI router 立刻 mismatch → abort。同时请求侧把
custom tool 降级成 function 时,upstream "do not wrap the patch in JSON" 的
description 在 chat 路径上反而误导模型;且没有 V4A 格式样例。

## 修复 (方案 B - adapter 双向桥接)

### 请求侧 `responses/request/tools.rs`

对 `name == "apply_patch"` 特判,把 custom → function 降级时:
- 替换 outer description 为 chat 路径准确的 V4A 指引(`*** Begin Patch`,文件
  操作头,hunk 标记,relative path,JSON 字符串里写 `\n` 转义换行)
- input 参数 description 镜像 V4A 关键约束

### 响应侧 `responses/converter.rs`

对 `name == "apply_patch"` 特判,emit Responses `custom_tool_call` wire 而非
`function_call`:
- `output_item.added` 用 `type:"custom_tool_call"`(empty `input`)
- 中间 args delta **不** emit(避免对 JSON 累积字符串做流式 input 提取)
- close 时一次性 emit `response.custom_tool_call_input.delta` +
  `.done` + `output_item.done`(`type:"custom_tool_call"`)
- 提取 input:`{"input":"<V4A>"}` JSON 解出;非 JSON 或缺 input 字段时整段
  原样透传(让 Codex CLI parse_patch 给出可读错误而非静默 abort)
- envelope `output[]` 终态用同一 input 字符串(cached 到 PendingToolCall,
  防 close 与 envelope build 之间 drift)
- interrupted (无 finish_reason 且非 [DONE] 收尾) 时 emit
  `status:"incomplete"` 并 **skip** `input.done`,防止严格客户端在 stream
  半截断时执行 partial patch (destructive tool 安全防线)
- `call_id` 在 `output_item.added` emit 后 freeze,不再被后续 chunk 覆盖
  (避免同一 item 暴露两个不同 call_id)
- 加 tracing telemetry:positive shim 触发 (info)、晚到 name (warn)、
  空 args (warn)、JSON parse 失败分流 (debug 裸 V4A / warn 真坏)

### 请求侧多轮回放 `responses/request.rs` (BLOCKER)

turn N+1 时 Codex CLI 把上一轮 `ResponseItem::CustomToolCall` /
`CustomToolCallOutput` 通过 `input[]` 回放给我们。原 `input_item_to_messages`
只处理 `function_call` / `function_call_output`,这两类静默落入 `_ =>` 兜底被
丢弃 → 多轮上下文丢失。本提交补两个分支:

- `custom_tool_call` → `role:assistant` + `tool_calls[]` (function-call 形态,
  arguments 包成 `{"input":"<V4A>"}` JSON 字符串,与首轮 lowering 形态一致)
- `custom_tool_call_output` → `role:tool` + `tool_call_id` + content

## 测试

新增 8 个回归测试 (响应侧 6 + 请求侧 2):
- chat tool_calls(apply_patch) → custom_tool_call wire
- JSON args / 裸 V4A 兜底 / 缺 input 字段
- interrupted stream → status=incomplete + skip input.done
- streaming output_item.done.input == envelope.output[].input
- custom_tool_call input → assistant.tool_calls (多轮回放)
- custom_tool_call_output → role:tool (多轮回放)
- request 侧 V4A 描述注入 (apply_patch vs 普通 custom 工具)

`cargo test --workspace`: 全套通过 (506 adapter unit + 12+10+3 集成,跟原仓
一致;唯一偶发并发 flake `gemini_oauth::cancel_slot_epoch_*` 与本提交无关,
serial 跑全过)。

## 注意

不影响 Codex / GPT 官方登录路径 (那条走原生 Responses API,不经 chat
adapter 转换)。本修复 strictly 针对 chat completions provider 转 Responses
的方向。

Refs #235
真机验证(用户 prompt 让模型在 README 5-10 行间插一段 markdown)发现:wire
桥接成功(25+ shim 触发 zero abort),但模型连续 20 分钟、25+ retry 在 V4A
hunk header 上栽跟头,最终 fallback 到 sed 才完成。

根因:`@@ <context> @@` 后的 space-prefixed 行的语义,在 freeform/lark
grammar 受约束的解码空间下不会错(模型只能产出语法合法序列);切到 chat
function-call 路径后 lark 强约束消失,description 只说了 ` `/`+`/`-`
prefix,**没说 space 行对应 anchor *之后* 的行**。DeepSeek 反复把 anchor
当 space 行重复一次,parse_patch 找不到这样的双行存在 → 拒。

修复:在 `APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT` 加显式 "CRITICAL HUNK
SEMANTICS" 段 + 一个最小可执行 V4A 示例(rename a let binding),展示
anchor 只出现在 `@@ ... @@` 里、不重复到 space 行。`APPLY_PATCH_INPUT_
DESCRIPTION_FOR_CHAT`(参数级)也加紧凑版同规则,防 provider 长 history
时截断 tool-level description。

测试:`tools_custom_apply_patch_injects_v4a_format_hint` 增加 4 个断言,
锁住 anchor 语义解释 + 最小示例 + 参数级紧凑版,防 description 在未来
refactor 时被误删。`cargo test --workspace` 全套通过。

注意:Codex CLI 端 `parse_patch` 失败不会经过 proxy log —— 那个错误在
client tool runtime 路径里被 emit 给模型作为 tool error,所以 PR 之前的
monitor 看不到。本次 follow-up 完全靠用户真机手工实测反馈(感谢)。

Refs #235
…ription (#235)

DeepSeek 稳定性测试(10 个 Level 全跑通,详见 #235 PR 评论)模型自己摸索出 4 个 chat-path
上 apply_patch 的非平凡行为,每次任务平均花 1-3 分钟绕弯子:

1. `@@ <非空文本> @@` 锚点在 chat 路径上常匹配失败 → 模型用 `printf '\n' >> file`
   种空行作锚点 → patch → 事后清理多余空行
2. `*** Add File: foo` + 同 patch 内 `*** Update File: foo` 冲突
   (新建文件未落盘 Update 已读取)→ 模型改用预建锚点文件
3. 纯空目标文件无法直接 `*** Update File:` → 必须 shell 先 seed 一行
4. 多行文件里纯 `+` 行在锚点后是"追加"不是"替换" → 需 `-` + `+` 配对替换

这些都是 Codex CLI 端 parse_patch 的实际行为,adapter 修不了 wire 层。但可以
预先在请求侧告诉模型这些 workaround,让首次成功率提升、token 浪费降低。

实现:

- `crates/adapters/src/responses/request.rs`:加 `tools_register_apply_patch()`
  检测 + `APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE` 文案 + `apply_patch_chat_guidance_message()`
  构造器;`build_messages_from_input` 紧跟 Codex CLI instructions system message
  之后追加注入,**仅当**当前 turn 的 tools 数组真正注册了 apply_patch
  (type:custom + name:apply_patch)。非 apply_patch turn 0 浪费。
- `crates/adapters/src/responses/request/tools.rs`:在 `APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT`
  末尾补 4 条紧凑版 workaround;`APPLY_PATCH_INPUT_DESCRIPTION_FOR_CHAT`(参数级)
  同步加更紧凑的 backup。三层 redundancy(system / tool desc / param desc)防止
  上游 provider 截断或弱化某一层时模型完全失指引。

设计取舍:

- 注入独立 system message(不合并到 Codex 原 instructions),保持职责分离 +
  方便日后调整 / 替换
- 文本英文,匹配现有 description 风格,跟下游模型 vocab 也更对齐
- 检测条件用 `type:"custom" && name:"apply_patch"` 而不是 lowered 后的
  `type:"function"`,因为我们在 `build_messages_from_input` 时拿到的是原始
  Responses body(`convert_responses_tool_to_chat_tool` 在 `tools` 字段转换路径
  里调用,跟 `messages` 字段构造路径平行)

测试:3 个新单测覆盖:
- 注册 apply_patch 时注入(Codex instructions 不被覆盖、4 条 workaround 都在、
  marker 存在)
- 未注册 apply_patch 时不注入(无 system 数量增加、无 guidance marker)
- 反复 convert 同一 body 3 次,每次 guidance 计数仍为 1(防 merge_consecutive_
  system_messages 之类后处理累积)

`cargo test --workspace`:509 adapter unit + 25 集成测试全过 (506→509)。
`cargo fmt --all -- --check` clean。

Refs #235
… + 3 示例 + byte-exact 规则

PR #236 修了 wire 层(custom_tool_call SSE 桥接 + 多轮历史回放 + 首版 system / description 注入),
但 issue #235 真机 capture(28-turn / 26MB / DeepSeek + Kimi)显示 7 个 apply_patch tool_call 中
6 个仍因模型生成的 V4A patch 内容质量被 Codex Desktop 端验证器拒绝:
  - 3 turn:模型直接吐 Python 代码,Codex Desktop 报 `invalid hunk at line 3, 'def main():' is not a valid hunk header`
  - 1 turn:V4A 格式正确但 `-` 行不 byte-exact 匹配文件,报 `Failed to find expected lines`
  - 3 turn:模型选 `exec_command` 而非 `apply_patch`

本 PR 针对模型生成质量,wire 层不再变化。

主要改动
- `crates/adapters/src/responses/apply_patch_v4a_reference.md`(新增):verbatim 镜像上游 Codex CLI
  `codex-rs/core/prompt_with_apply_patch_instructions.md` L277-L351 @ commit
  `0b4f86095c8005d8f74e9c62b971d72c1670aa88`(Apache-2.0, Copyright 2025 OpenAI)。头部加 adapter
  note 显式 override "shell command" 字眼为 "function-call tool",其余原文未改动。
- `crates/adapters/src/responses/request.rs::APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE`:重写为三段结构 —
  (1) Tool selection 顶层引导(对抗 exec_command 偏好) (2) include_str! 嵌入上述 V4A 教学
  (3) 5 条 chat-path 实测 gotcha(byte-exact / Empty-line anchor 仅当 blank 存在 / Add+Update 同
  patch / 空文件 / 纯 `+` 行不替换)。
- `crates/adapters/src/responses/request/tools.rs::APPLY_PATCH_TOOL_DESCRIPTION_FOR_CHAT`:扩展加入
  Tool selection 顶层引导、BYTE-EXACT 匹配规则、3 个 positive example(modify line / Add File /
  多 hunk Update)、anti-pattern reminder("NEVER pass raw source code")。
- `APPLY_PATCH_INPUT_DESCRIPTION_FOR_CHAT`(参数级 mirror):部分 provider 在长历史中降权 tool-level
  description,参数级补 byte-exact + anti-pattern 紧凑版保持可见性。
- License 合规:新增 NOTICE 文件;ACKNOWLEDGEMENTS.md / README.md / README.en.md 致谢段同步加上游
  attribution(full 40-char SHA + L277-L351 + Apache-2.0)。
- docs/CHANGELOG.md + docs/investigation/protocol-conversion-3way-comparison.md 同步本次改动。
- 5 个新 / 扩展 unit test 断言:Tool selection / V4A reference verbatim 引用证据(`Patch := Begin
  { FileOp } End` EBNF 块 + `@@ class BaseClass` 双锚点 example) / byte-exact / 3 positive
  example / anti-pattern reminder。509 tests pass。

成功率改善幅度待 push 后真机 regression 测试出 round2 数据,届时在 PR description 补具体数字。

Refs #235
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

/// attribution 同时见 NOTICE 文件 + README 中英致谢段 +
/// ACKNOWLEDGEMENTS.md + `apply_patch_v4a_reference.md` 文件头部
/// adapter note。
/// 上游若发版,**同步**更新 5 处 commit SHA:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Doc comment says "5 处" (5 locations) but lists 6 items for commit SHA sync

The doc comment on APPLY_PATCH_CHAT_PATH_SYSTEM_GUIDANCE at line 2299 says 上游若发版,**同步**更新 5 处 commit SHA: ("sync 5 commit SHA locations") but then lists 6 bullet points (lines 2300-2305). The ACKNOWLEDGEMENTS.md (line 145) correctly says 共 6 处 commit SHA. This count mismatch in maintenance instructions could cause a maintainer to stop after updating 5 locations, missing the 6th (NOTICE file), leaving it with a stale upstream commit SHA.

Suggested change
/// 上游若发版,**同步**更新 5 处 commit SHA:
/// 上游若发版,**同步**更新 6 处 commit SHA:
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1054 to +1061
return json!({
"type": "custom_tool_call",
"id": pending.fc_id,
"call_id": pending.call_id,
"name": pending.name,
"input": input,
"status": "completed",
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Interrupted apply_patch envelope status says "completed" while streaming event says "incomplete"

When a stream is interrupted (finish_reason is None and from_done is false), close_tool_call correctly emits response.output_item.done with status: "incomplete" at converter.rs:745. However, the subsequent emit_close call builds the response.completed envelope using tool_call_item_completed, which always returns "status": "completed" for apply_patch items (converter.rs:1060). This violates the code's own stated invariant at converter.rs:1042-1043: "envelope.output[] 终态必须和流式 response.output_item.done 的 item 一致".

The root cause is that PendingToolCall has no field to track whether close_tool_call was called with interrupted=true, and apply_patch_input is set (converter.rs:724-725) before the interrupted check on line 732. So tool_call_item_completed sees a valid apply_patch_input and returns "completed". A strict client that reads the envelope after a reconnection could see "completed" and attempt to execute a truncated partial patch — since apply_patch is destructive (file modifications), this is dangerous.

Prompt for agents
The tool_call_item_completed method always returns status=completed for apply_patch items, even when close_tool_call emitted status=incomplete for an interrupted stream. The fix requires:

1. Add a boolean field (e.g. `interrupted: bool`) to the PendingToolCall struct (converter.rs around line 58-84)
2. In close_tool_call, when the interrupted branch runs (converter.rs around line 732-762), set pending.interrupted = true alongside pending.closed = true
3. In tool_call_item_completed (converter.rs around line 1040-1062), check pending.interrupted (or equivalent) to decide whether to use "incomplete" or "completed" for the status field

This ensures the envelope output[].status matches the streaming response.output_item.done.item.status, honoring the invariant stated in the comment at line 1042-1043.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Cmochance added a commit that referenced this pull request May 21, 2026
)

完整修复 apply_patch 工具在 chat-completions provider(DeepSeek/Kimi/Grok 等)的 diff UI:

- wire 层 custom_tool_call SSE 桥接 + 多轮 previous_response_id 历史回放
- V4A `@@` 单端语法修复(round 4 真机根因 — 原 `@@ <ctx> @@` 双端写法导致 applier 把 trailing `@@` 当字面 fail)
- 删除 EMPTY LINE anchor 误导推荐
- 明示 MINIMAL Update form(`-line` / `+line` 直接无 `@@`)作首选 + Add File 必须 `+` 前缀 + `-text` prefix 无空格
- envelope `output[]` interrupted apply_patch status 跟流式 `done` event 一致(Devin pre-merge BUG fix)
- test assertion 精确大写匹配 NEVER 规则(Devin pre-merge fix #2)
- guidance 仅 first turn 注入,防多轮累积污染上下文(Devin pre-merge fix #3)

真机 Kimi round 1-6 capture 70+ min 验证:Update File 失败率 100% → 0%,模型 reasoning 零 self-correction。

PR #238 / #239 stacked 验证留待 DeepSeek round 7-N。

Refs #235
@Cmochance
Copy link
Copy Markdown
Owner Author

Closing — round 1-9 真机验证完成后回看,本 PR 的"增强 prompt verbatim V4A walkthrough"方向实际反向有效:

本 PR 内容已在 PR #236 + #240 中被更精准、更少噪声地吸收。close。

@Cmochance Cmochance closed this May 21, 2026
@Cmochance Cmochance deleted the worktree-feat+apply-patch-chat-path-prompt-quality branch May 21, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant