Skip to content

verify_only: limit dry run to numberio from state file#3

Open
hanee-kim wants to merge 1 commit into
masterfrom
claude/fix-state-verification-dryrun-DkDws
Open

verify_only: limit dry run to numberio from state file#3
hanee-kim wants to merge 1 commit into
masterfrom
claude/fix-state-verification-dryrun-DkDws

Conversation

@hanee-kim
Copy link
Copy Markdown
Owner

When verify_state_load is enabled, do_dry_run() now uses the
loaded state file's numberio to control termination instead of
relying solely on size-based loop completion.

Two changes to the log_io_piece block:

  • Break out of the dry run loop when io_u->numberio reaches
    vstate->numberio (the write count recorded at state-save time).
  • Skip log_io_piece for writes that were still in-flight when the
    state was saved (verify_state_should_stop returns 1 for those),
    since those writes may not have completed and should not be
    queued for verification.

https://claude.ai/code/session_01MTN95VHyH9fazG5VJuZNwp

@hanee-kim hanee-kim force-pushed the claude/fix-state-verification-dryrun-DkDws branch 2 times, most recently from a08c5a2 to cd3377d Compare April 28, 2026 07:17
When verify_state_load is active, do_dry_run() was running until
io_complete_bytes_exceeded() fired based on the configured job size.
do_verify() already limits itself to the state-file IO count, but the
preceding dry run would replay fake IOs for the full configured size.
With a write job smaller than the verify job size the dry run would
never finish.

Add an early exit when td->io_issues[DDIR_WRITE] reaches
vstate->numberio.  The check is placed before the io_issues increment
to avoid inflating the count by one.  Use clear_io_u() on that break
path instead of put_io_u(); the latter does not clear IO_U_F_FLIGHT,
which would cause a td_io_queue() assertion on the next reuse of the
io_u.  Also skip log_io_piece() for writes still in-flight at save
time, which verify_state_should_stop() identifies, as those may not
have reached stable storage.

Without an explicit termination signal, keep_running() compares
io_bytes against the configured size and re-enters the outer loop
after do_verify() returns.  The second do_dry_run() opens the device
but logs nothing, leaving the file half-open; do_verify() then issues
a sync+invalidate and a stray read at offset 0x0 whose completion
never arrives, hanging fio.  Call fio_mark_td_terminate() after
do_verify() when verify_state_load is active so the existing terminate
check exits cleanly after one dry_run + verify cycle.

https://claude.ai/code/session_01MTN95VHyH9fazG5VJuZNwp
@hanee-kim hanee-kim force-pushed the claude/fix-state-verification-dryrun-DkDws branch from cd3377d to 6da373c Compare April 28, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants