Skip to content

Retry structure step up to 3 times on failure#3

Open
ha1t wants to merge 1 commit into
tomohiro-owada:mainfrom
ha1t:feat/structure-retry
Open

Retry structure step up to 3 times on failure#3
ha1t wants to merge 1 commit into
tomohiro-owada:mainfrom
ha1t:feat/structure-retry

Conversation

@ha1t
Copy link
Copy Markdown

@ha1t ha1t commented May 22, 2026

背景

Step 1 は LLM に XML 形式で wiki ページ一覧を返してもらうステップですが、運用中に以下のような単発失敗でプロジェクト全体の生成が落ちるケースがありました。

失敗パターン 内容
claudeCall 自体がエラー Claude CLI のタイムアウト、起動失敗、ネットワーク等
② レスポンスは取れるが parsePages が 0 件 LLM が期待した XML 形式で返さなかった、空の応答、フェンス付きで返ってきた、など

特に ② は LLM の出力ゆらぎなので、再実行すれば成功することが多い割に、ログにも応答内容が残らず原因が掴みづらい状況でした。

変更内容

  • Step 1 を maxRetries = 3 の for ループで包む
  • ① の場合: stderr に ⚠️ structure attempt N failed: ... を出してリトライ
  • ② の場合: stderr に ⚠️ structure attempt N: no pages found in response (X bytes) を出し、応答全文を <wikiDir>/_debug_structure_attemptN.txt に書き出してからリトライ
  • 進捗表示は2回目以降 📋 structure... (retry N/3) に変える
  • 3回全て失敗した場合は no pages found in structure after 3 attempts を返す(既存メッセージから "after N attempts" に変更)

デバッグファイルについて

_debug_structure_attemptN.txtparsePages で 0 件になった時のみ書き出します。成功時には書かないため、wiki
ディレクトリにファイルが残っていればその回が失敗していたことが分かります。LLM がどんな形で返してきたかをそのまま確認できるので、プロンプト改善や cleanXMLResponse / parsePages
の調整に使えます。

動作確認

  • 正常系: 1回目で成功 → 進捗表示・出力は従来通り、デバッグファイルは作られない
  • ②パターン再現: パース失敗を強制 → デバッグファイルが _debug_structure_attempt1.txt, _2.txt, _3.txt と書き出されることを確認
  • 3回連続失敗: no pages found in structure after 3 attempts で終了することを確認

The structure determination step can fail intermittently or return
an empty page list. Retry up to 3 times, dumping each failed
response to a debug file inside the wiki output dir for inspection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add retry logic to structure determination with debug output

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement retry mechanism for wiki structure determination step
  - Retries up to 3 times on claudeCall failures or empty page results
  - Displays retry progress indicator on subsequent attempts
• Add debug file output for failed parsing attempts
  - Writes raw LLM response to _debug_structure_attemptN.txt for inspection
  - Helps diagnose LLM output format issues and prompt improvements
• Update error message to indicate retry attempts exhausted
Diagram
flowchart LR
  A["Structure Step<br/>Attempt 1-3"] --> B{claudeCall<br/>Success?}
  B -->|No| C["Log Error<br/>Retry if &lt; 3"]
  B -->|Yes| D{Pages<br/>Found?}
  D -->|No| E["Write Debug File<br/>Retry if &lt; 3"]
  D -->|Yes| F["Return Pages"]
  C -->|Retry| A
  E -->|Retry| A
  C -->|Max Retries| G["Return Error"]
  E -->|Max Retries| H["Return Error<br/>after 3 attempts"]

Loading

File Changes

1. main.go ✨ Enhancement +30/-9

Add retry logic and debug output to structure step

• Wrapped structure determination step in a retry loop (max 3 attempts)
• Added conditional progress display showing retry count on subsequent attempts
• Implemented error handling to retry on claudeCall failures with stderr logging
• Added page parsing validation with debug file output on empty results
• Updated final error message to indicate number of failed attempts

main.go


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 22, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Misleading final retry log 🐞 Bug ◔ Observability
Description
In generateWiki, the no-pages path always logs “retrying...” even on the final attempt, even though
the loop will exit and return an error immediately afterward. This produces contradictory operator
output during failures.
Code

main.go[R581-587]

Evidence
The log line includes “retrying...” unconditionally for the no-pages path, but after the 3rd attempt
the loop ends and the function returns an error, so no retry actually occurs.

main.go[555-588]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The no-pages case logs `retrying...` even when `attempt == maxRetries`, but the code then exits the loop and returns `no pages found...`, which is misleading.

### Issue Context
This happens in Step 1’s retry loop when `parsePages()` returns 0 pages on the final attempt.

### Fix Focus Areas
- main.go[559-588]

### Suggested change
- Only print the `retrying...` message when `attempt < maxRetries`.
- For `attempt == maxRetries`, print a final/failure message (or skip the retry wording) before returning the error.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. WriteFile error ignored 🐞 Bug ☼ Reliability
Description
The debug response dump ignores the error from os.WriteFile, so permission/disk failures will
silently prevent the debug file from being created. This can leave you with no artifact even though
the code path intended to dump one.
Code

main.go[R583-585]

Evidence
The code calls os.WriteFile directly and discards the returned error, so failures to write the debug
file will not be reported or handled.

main.go[581-585]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`os.WriteFile(...)` is called without checking its returned error when dumping the failed structure response. If the write fails, the program silently continues, undermining the debug mechanism.

### Issue Context
This is the new Step 1 structure retry debug dump.

### Fix Focus Areas
- main.go[581-585]

### Suggested change
- Capture the error: `if werr := os.WriteFile(...); werr != nil { ... }`.
- Emit a stderr warning (and/or append to `_errors.log`) that includes the debug file path and error.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Debug dump not raw 🐞 Bug ⚙ Maintainability
Description
The dumped debug file contains structureContent after cleanXMLResponse has modified it (fence
stripping and truncation after </wiki_structure>). This can hide the original malformed LLM output
that caused parsePages to return 0, reducing the dump’s diagnostic value.
Code

main.go[R575-585]

Evidence
generateWiki calls cleanXMLResponse before parsing and before writing the debug dump, and
cleanXMLResponse explicitly strips markdown fences and truncates content at the last
</wiki_structure>, meaning the dumped file is not the original response bytes.

main.go[566-585]
main.go[401-417]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The debug dump is written after `cleanXMLResponse(structureContent)`, but `cleanXMLResponse` mutates content (removes code fences and truncates after the closing tag). For debugging LLM output variability, the raw response is often the most useful.

### Issue Context
This affects `_debug_structure_attemptN.txt` created when `parsePages` returns 0.

### Fix Focus Areas
- main.go[566-585]
- main.go[401-417]

### Suggested change
- Preserve the raw response in a separate variable (e.g., `rawStructureContent := structureContent`) before cleaning.
- Write the debug file from the raw content (or write both raw and cleaned variants with different filenames).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant