Checkpoint sync state on all cancellation types by arreyder · Pull Request #694 · ConductorOne/baton-sdk

arreyder · 2026-02-21T05:52:41Z

Summary

Sync loop now checkpoints in-memory progress on all cancellation types (e.g. worker shutdown SIGTERM), not just DeadlineExceeded (run duration timeout)
Uses context.WithTimeout(context.Background(), 30s) for the checkpoint since the caller's context may already be cancelled — same pattern as c1file.Close()
Without this fix, up to 10s of sync progress (minCheckpointInterval) is silently lost on every rolling deploy

Context

Combined with the c1-side WorkerStopTimeout fix (c1 PR #14204), this ensures the full graceful shutdown chain works:

Worker shutdown → Sync() checkpoints state before returning (this PR)
syncer.Close() → WAL checkpoint + local saveC1z
c1's manager.SaveC1Z() → uploads to S3 within the 25s grace period

Test plan

New TestCancellationCheckpoints — cancels context mid-sync, verifies checkpoint persists, resumes successfully
All existing pkg/sync/... tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved sync cancellation handling to save progress through checkpointing before exiting
Tests
- Added test coverage for cancellation and checkpoint resume workflows

…eeded When a Temporal worker shuts down (SIGTERM during rolling deploy), Sync() returned immediately without checkpointing in-memory sync progress. The sync loop only checkpointed on DeadlineExceeded (run duration timeout), silently losing up to 10s of progress on other cancellation causes like worker shutdown. Use a background context with 30s timeout for the checkpoint since the caller's context may already be cancelled, matching the pattern in c1file.Close(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-21T05:53:01Z

No actionable comments were generated in the recent review. 🎉

Walkthrough

The sync package's cancellation handling now ensures checkpoints persist before exiting when context cancellation occurs. A new test validates that syncs can resume from checkpoints after mid-operation cancellation.

Changes

Cohort / File(s)	Summary
Cancellation Checkpoint Logic `pkg/sync/syncer.go`	Modified context cancellation handler to perform a checkpoint operation with a 30-second timeout using a background context before returning. Logs informational message on cancellation and errors on checkpoint failure, while still returning the original cancellation error.
Cancellation Resume Test `pkg/sync/syncer_test.go`	Added TestCancellationCheckpoints test with supporting cancelOnResourcesMockConnector mock type. Validates that syncs can be interrupted, checkpointed, and resumed from that checkpoint in subsequent sync invocations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 When cancellation calls so loud,
We checkpoint with determined cloud,
A timeout guards our graceful flight,
Resume from where we paused mid-night,
Sync flows on, ever bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the main objective: checkpointing sync state on cancellation types instead of only on timeout.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch crr/fix-shutdown-c1z-race

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…eeded When the sync context is cancelled for reasons other than DeadlineExceeded (e.g. worker shutdown via SIGTERM), the sync loop exited without checkpointing. This lost all progress made since the last periodic checkpoint, forcing a full re-sync of those pages on the next attempt. Now checkpoint on all cancellation types using a background context with a 30s timeout, since the caller's context is already cancelled. Also broaden the cleanup error handler in syncer.go to treat any context cancellation (not just DeadlineExceeded) as ErrSyncNotComplete. This is a rewrite of #694 against the current codebase, which was refactored from a single sync loop into sequentialSync/parallelSync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arreyder · 2026-03-16T15:46:25Z

Superseded by #738 which rewrites this fix against current main (the sync loop was refactored into sequentialSync/parallelSync since this branch was created).

linear · 2026-03-16T15:46:31Z

OPS-371 Checkpoint sync state for all cancellation types

…eeded When the sync context is cancelled for reasons other than DeadlineExceeded (e.g. worker shutdown via SIGTERM), the sync loop exited without checkpointing. This lost all progress made since the last periodic checkpoint, forcing a full re-sync of those pages on the next attempt. Now checkpoint on all cancellation types using a background context with a 30s timeout, since the caller's context is already cancelled. Also broaden the cleanup error handler in syncer.go to treat any context cancellation (not just DeadlineExceeded) as ErrSyncNotComplete. This is a rewrite of #694 against the current codebase, which was refactored from a single sync loop into sequentialSync/parallelSync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arreyder requested review from ggreer, kans and loganintech February 21, 2026 06:04

arreyder enabled auto-merge (squash) February 21, 2026 19:20

arreyder mentioned this pull request Mar 16, 2026

Checkpoint sync state on all cancellation types, not just DeadlineExceeded #738

Open

arreyder closed this Mar 16, 2026

auto-merge was automatically disabled March 16, 2026 15:46
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint sync state on all cancellation types#694

Checkpoint sync state on all cancellation types#694
arreyder wants to merge 1 commit intomainfrom
crr/fix-shutdown-c1z-race

arreyder commented Feb 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

arreyder commented Mar 16, 2026

Uh oh!

linear Bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arreyder commented Feb 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

arreyder commented Mar 16, 2026

Uh oh!

linear Bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arreyder commented Feb 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 21, 2026 •

edited

Loading