optimizer: allow resume of short-completed runs by ZhengyaoJiang · Pull Request #143 · WecoAI/weco-cli

ZhengyaoJiang · 2026-04-26T19:23:50Z

Summary

Allow `weco resume` to recover runs that the backend marked `completed` short of their step budget. Previously these were stuck — the only escape was a fresh run.

Why

A transient `Failed to submit result` on the CLI side can race with a successful backend ack: the backend records the result and marks the run `completed` (its own step counter), but the CLI exits with `submit_failed` well short of `total_steps`. Subsequent `weco resume` returns `Run cannot be resumed (status: completed)` and the user has to start over.

Discovered during a multi-day fraud-detection IEEE-CIS rerun: ~5 of 18 cells "completed" at 60–130 steps out of 200 because of intermittent network blips.

Change

`resume_optimization()` now treats `status='completed' AND current_step < total_steps` ("short-completed") as resumable. The run is flipped back to `running` by `resume_optimization_run()` the same way it always was, and the queue loop picks up at the prepared `start_step`.

Test plan

Existing `error`/`terminated` resume paths still work (no regression).
On a run with `status='completed', current_step=60, steps=200`, `weco resume` prompts and continues at step 60.
On a run with `status='completed', current_step=200, steps=200` (truly done), resume is still rejected with the `completed/total_steps` message — no spurious re-runs.

🤖 Generated with Claude Code

…_steps A transient `Failed to submit result` on the CLI side can race with a successful backend ack: the backend records the result and marks the run "completed" (its own step counter), but the CLI exits with submit_failed well short of the configured step budget. Subsequent `weco resume` returns "Run cannot be resumed (status: completed)" and the only recovery is to start a new run from scratch — losing the step history. resume_optimization() now treats `status='completed' AND current_step < total_steps` ("short-completed") as resumable, alongside the existing "error"/"terminated" cases. The run status is bumped back to "running" by resume_optimization_run() the same way it always was, and the queue loop picks up at the prepared start_step. The branch where this matters in practice: a multi-day Weco run on fraud-detection IEEE-CIS, where the network blip during submit caused ~5 of 18 cells to "complete" at 60-130 steps out of 200. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49bf182a0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T19:27:20Z

    current_step = int(status.get("current_step", 0))
    steps_remaining = int(total_steps) - current_step


Guard step parsing before checking resumable status

This change moved the resumable-status gate below current_step/total_steps parsing, so weco resume now attempts int(...) conversion even for runs that should be rejected immediately. If the API returns current_step or optimizer.steps as null (which can happen on older/incomplete run metadata), int(None) raises and the command crashes instead of printing the non-resumable status message; previously completed runs never hit this path because they were rejected first.

Useful? React with 👍 / 👎.

aliroberts · 2026-05-06T15:13:07Z

Hi @ZhengyaoJiang, do you have an example run ID here?

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizer: allow resume of short-completed runs#143

optimizer: allow resume of short-completed runs#143
ZhengyaoJiang wants to merge 1 commit intodevfrom
vk/resume-short-completed

ZhengyaoJiang commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

aliroberts commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		current_step = int(status.get("current_step", 0))
		steps_remaining = int(total_steps) - current_step

Conversation

ZhengyaoJiang commented Apr 26, 2026

Summary

Why

Change

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

aliroberts commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants