Skip to content

Fix SaveC1Z data loss race: use atomic rename instead of truncating write#695

Closed
arreyder wants to merge 1 commit intomainfrom
crr/atomic-savec1z
Closed

Fix SaveC1Z data loss race: use atomic rename instead of truncating write#695
arreyder wants to merge 1 commit intomainfrom
crr/atomic-savec1z

Conversation

@arreyder
Copy link
Copy Markdown
Contributor

@arreyder arreyder commented Feb 21, 2026

Summary

  • localManager.SaveC1Z() used os.Create() which truncates the destination file immediately before writing data
  • If the process is killed mid-copy (e.g. rolling deploy, OOM kill), the file is left with a valid c1z header but a truncated zstd stream
  • This causes all subsequent sync attempts to fail until manual intervention
  • Replace with write-to-staging file → fsync → atomic rename, matching the pattern already used by the internal saveC1z() function in file.go

Test plan

  • go build ./pkg/dotc1z/manager/local/... passes
  • Staging file cleaned up on failure via deferred os.Remove
  • Companion c1-side PR adds pre-upload zstd validation as defense-in-depth: https://github.com/ductone/c1/pull/14209

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced data persistence reliability with improved write handling to reduce the risk of file corruption or truncation if the application unexpectedly terminates during save operations.

…rite

SaveC1Z previously used os.Create() which truncates the destination file
immediately before copying data. If the process is killed mid-copy, the
file is left with a valid c1z header but a truncated zstd stream, causing
all subsequent sync attempts to fail until manual intervention.

Replace with write-to-staging → fsync → atomic rename pattern (matching
the internal saveC1z function). The destination file is either fully
written or unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 21, 2026

Walkthrough

The SaveC1Z function is reworked to implement a staged write pattern, writing to a temporary file first, then synchronously flushing and atomically renaming it to the destination. This prevents data corruption if a crash occurs mid-write, while adding comprehensive error handling and cleanup logic.

Changes

Cohort / File(s) Summary
Staged Write Improvement
pkg/dotc1z/manager/local/local.go
Modified SaveC1Z to use temporary file staging with explicit sync and atomic rename. Adds validation of temporary and destination paths, error wrapping for each operation step (creation, copy, sync), and conditional cleanup logic to remove staging file on failure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through the file system's night,
Writing to safety with staged delight!
No truncation crashes, just atomic renames so sweet,
Temporary .tmp files make the data dance complete! ✨
hop hop – now your saves won't lose the fight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing a data loss race condition by replacing truncating writes with atomic rename operations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch crr/atomic-savec1z

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/dotc1z/manager/local/local.go (2)

186-193: Consider adding size verification after copy for consistency.

The copyFileToTmp function (lines 82-91) verifies that the copied size matches the source file size. For consistency and defense-in-depth, consider adding similar verification here after the copy and sync operations.

♻️ Optional: Add size verification
 	size, err := io.Copy(dstFile, tmpFile)
 	if err != nil {
 		return fmt.Errorf("failed to copy to staging file: %w", err)
 	}

 	if err := dstFile.Sync(); err != nil {
 		return fmt.Errorf("failed to sync staging file: %w", err)
 	}
+
+	// Verify size matches source
+	srcStat, err := tmpFile.Stat()
+	if err != nil {
+		return fmt.Errorf("failed to stat source temp file: %w", err)
+	}
+	if size != srcStat.Size() {
+		return fmt.Errorf("copy size mismatch: expected %d bytes but copied %d bytes", srcStat.Size(), size)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/dotc1z/manager/local/local.go` around lines 186 - 193, After copying to
the staging file with io.Copy(dstFile, tmpFile) in the same function, verify
that the returned size equals the expected source size (same check done in
copyFileToTmp); after the Sync() call compare size to the source file's size
variable (or stat the original) and return an error like "copied size mismatch"
if they differ; update the block around dstFile, tmpFile and io.Copy to perform
this size validation for consistency and defense-in-depth.

170-171: Consider using os.CreateTemp for the staging file to avoid collisions.

The staging path l.filePath + ".tmp" is predictable. If multiple processes or goroutines attempt to save to the same destination concurrently, they could overwrite each other's staging files. Using os.CreateTemp in the same directory would generate a unique name and avoid potential races.

That said, if concurrent saves to the same destination are not a supported use case, this is acceptable.

♻️ Optional: Use unique staging file
-	stagingPath := l.filePath + ".tmp"
-	dstFile, err := os.OpenFile(stagingPath, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0644)
+	dstFile, err := os.CreateTemp(filepath.Dir(l.filePath), ".tmp-*")
 	if err != nil {
 		return fmt.Errorf("failed to create staging file: %w", err)
 	}
+	stagingPath := dstFile.Name()

Note: This would require adding "path/filepath" to imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/dotc1z/manager/local/local.go` around lines 170 - 171, The staging file
name is predictable (stagingPath := l.filePath + ".tmp") which risks collisions;
change the logic in the save/write routine that uses stagingPath and dstFile
(referencing l.filePath and dstFile) to create a unique temp file in the same
directory via os.CreateTemp (importing path/filepath), write to and close that
temp file, set the desired permissions if needed, then atomically replace the
target with os.Rename to l.filePath; this ensures uniqueness and avoids
concurrent overwrite races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/dotc1z/manager/local/local.go`:
- Around line 186-193: After copying to the staging file with io.Copy(dstFile,
tmpFile) in the same function, verify that the returned size equals the expected
source size (same check done in copyFileToTmp); after the Sync() call compare
size to the source file's size variable (or stat the original) and return an
error like "copied size mismatch" if they differ; update the block around
dstFile, tmpFile and io.Copy to perform this size validation for consistency and
defense-in-depth.
- Around line 170-171: The staging file name is predictable (stagingPath :=
l.filePath + ".tmp") which risks collisions; change the logic in the save/write
routine that uses stagingPath and dstFile (referencing l.filePath and dstFile)
to create a unique temp file in the same directory via os.CreateTemp (importing
path/filepath), write to and close that temp file, set the desired permissions
if needed, then atomically replace the target with os.Rename to l.filePath; this
ensures uniqueness and avoids concurrent overwrite races.

@arreyder arreyder enabled auto-merge (squash) February 21, 2026 19:20
@arreyder
Copy link
Copy Markdown
Contributor Author

Closing in favor of #606 which covers the same atomic-write fix for local.go plus also fixes file.go and adds tests.

@arreyder arreyder closed this Mar 16, 2026
auto-merge was automatically disabled March 16, 2026 15:32

Pull request was closed

@linear
Copy link
Copy Markdown

linear Bot commented Mar 16, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant