Skip to content

refactor(db): return sync results explicitly#1224

Merged
corylanou merged 2 commits intomainfrom
issue-1221-syncresult-phase1
Apr 15, 2026
Merged

refactor(db): return sync results explicitly#1224
corylanou merged 2 commits intomainfrom
issue-1221-syncresult-phase1

Conversation

@corylanou
Copy link
Copy Markdown
Collaborator

Summary

  • change verifyAndSync() to return a syncResult instead of relying on sync() to apply sync state as a side effect
  • apply sync results explicitly in Sync() and checkpoint()
  • add focused unit coverage for applySyncResult()

Why

This is the first increment from #1221, following Ben's comment to start with SyncResult and keep the refactor very small.

The existing proof-of-concept work in #1222 and branch issue-1221-proposal-incremental-extraction-of-sync-checkpoint-monitorin was useful to validate the overall direction, but it is too broad to merge safely or review comfortably in one pass. The plan here is to keep that branch as reference material only, and land the extraction as small, reviewable steps instead.

Scope

This PR intentionally stops at the SyncResult extraction. It does not include the wider checkpoint restructuring, sync-state split, restore extraction, or soak-test expansion from the proof-of-concept branch.

Testing

  • pre-commit run --all-files
  • go test ./...
  • go test -race ./...
  • go test . -run '^(TestApplySyncResult|TestDB_Sync_UpdatesMetrics|TestDB_Sync_InitErrorMetrics)$' -count=1
  • go test . -run '^(TestDB_CheckpointDoesNotTriggerSnapshot|TestDB_CheckpointPageGapWithConcurrentWrites|TestDB_Sync_CompactionValidAfterGrowthAndCheckpoint)$' -count=1
  • go test -count=1 -v -tags='integration,soak' -run '^(TestLTXBehavior|TestLTXBehavior_NoExcessiveSnapshots)$' -test.short -timeout=30m ./tests/integration

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 5, 2026

PR Build Metrics

⚠️ Attention needed — vulnerabilities found, Go toolchain outdated (1.25.8 → 1.25.9)

Check Status Summary
Binary size 35.96 MB (0.0 KB / 0.00%)
Dependencies No changes
Vulnerabilities ⚠️ Issues found — expand details below
Go toolchain ⚠️ 1.25.8 → 1.25.9 available
Module graph 1206 edges (0)

Binary Size

Size Change
Base (016c368) 35.96 MB
PR (e03ffac) 35.96 MB 0.0 KB (0.00%)

Dependency Changes

No dependency changes.

govulncheck Output

=== Symbol Results ===

Vulnerability #1: GO-2026-4947
    Unexpected work during chain building in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2026-4947
  Standard library
    Found in: crypto/x509@go1.25.8
    Fixed in: crypto/x509@go1.25.9
    Example traces found:
      #1: internal/resumable_reader.go:76:22: internal.ResumableReader.Read calls http.readWriteCloserBody.Read, which eventually calls x509.Certificate.Verify

Vulnerability #2: GO-2026-4946
    Inefficient policy validation in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2026-4946
  Standard library
    Found in: crypto/x509@go1.25.8
    Fixed in: crypto/x509@go1.25.9
    Example traces found:
      #1: internal/resumable_reader.go:76:22: internal.ResumableReader.Read calls http.readWriteCloserBody.Read, which eventually calls x509.Certificate.Verify

Vulnerability #3: GO-2026-4870
    Unauthenticated TLS 1.3 KeyUpdate record can cause persistent connection
    retention and DoS in crypto/tls
  More info: https://pkg.go.dev/vuln/GO-2026-4870
  Standard library
    Found in: crypto/tls@go1.25.8
    Fixed in: crypto/tls@go1.25.9
    Example traces found:
      #1: nats/replica_client.go:195:25: nats.ReplicaClient.connect calls nats.Connect, which eventually calls tls.Conn.Handshake
      #2: server.go:135:31: litestream.Start calls http.Server.Serve, which eventually calls tls.Conn.HandshakeContext
      #3: internal/resumable_reader.go:76:22: internal.ResumableReader.Read calls http.readWriteCloserBody.Read, which calls tls.Conn.Read
      #4: internal/hexdump.go:30:14: internal.Hexdump calls fmt.Fprintf, which calls tls.Conn.Write
      #5: heartbeat.go:55:30: litestream.HeartbeatClient.Ping calls http.Client.Do, which eventually calls tls.Dialer.DialContext

Vulnerability #4: GO-2026-4865
    JsBraceDepth Context Tracking Bugs (XSS) in html/template
  More info: https://pkg.go.dev/vuln/GO-2026-4865
  Standard library
    Found in: html/template@go1.25.8
    Fixed in: html/template@go1.25.9
    Example traces found:
      #1: nats/replica_client.go:460:32: nats.ReplicaClient.DeleteAll calls errors.joinError.Error, which calls template.Error.Error
      #2: server.go:135:31: litestream.Start calls http.Server.Serve, which eventually calls template.Template.Execute
      #3: server.go:135:31: litestream.Start calls http.Server.Serve, which eventually calls template.Template.ExecuteTemplate
      #4: replica.go:712:21: litestream.Replica.Restore calls http.http2requestBody.Close, which eventually calls template.Template.Funcs
      #5: replica.go:712:21: litestream.Replica.Restore calls http.http2requestBody.Close, which eventually calls template.Template.Parse
      #6: replica.go:1637:27: litestream.WriteTXIDFile calls fmt.Fprintln, which eventually calls template.context.String

Your code is affected by 4 vulnerabilities from the Go standard library.
This scan also found 2 vulnerabilities in packages you import and 1
vulnerability in modules you require, but your code doesn't appear to call these
vulnerabilities.
Use '-show verbose' for more details.

Build Info

Metric Value
Build time 40s
Go version go1.25.8
Commit e03ffac

History (2 previous)

Commit Updated Status Summary
98d83db 2026-04-15 17:51 UTC 35.96 MB (0.0 KB / 0.00%)
cde4a4e 2026-04-05 20:19 UTC 35.96 MB (0.0 KB / 0.00%)

🤖 Updated on each push.

@corylanou
Copy link
Copy Markdown
Collaborator Author

Follow-up after comparing this phase-1 branch against origin/main and trying to answer whether the SyncResult extraction made behavior worse.

What I added on this branch:

  • 7272b64 adds a deterministic seam test for the phase-1 refactor: TestVerifyAndSync_DelaysStateMutationUntilApply
  • That test verifies verifyAndSync() returns updated state without mutating the DB object until applySyncResult() is called
  • This is the part of the change I trust most, because it directly exercises the refactor seam instead of depending on soak timing

Commands run on this branch:

  • go test -count=1 ./... -run '^(TestApplySyncResult|TestVerifyAndSync_DelaysStateMutationUntilApply)$'
  • go test -count=1 -v -tags='integration,soak' -run '^TestComprehensiveSoak$' -test.short -timeout=1h ./tests/integration
  • SOAK_AUTO_PURGE=yes go test -count=1 -v -tags='integration,soak,docker' -run '^TestMinIOSoak$' -test.short -timeout=30m ./tests/integration
  • go test -count=1 -v -tags='integration,soak' -run '^(TestLTXBehavior|TestLTXBehavior_NoExcessiveSnapshots)$' -test.short -timeout=30m ./tests/integration

Focused comparison runs against origin/main:

  • TestLTXBehavior_NoExcessiveSnapshots rerun 3x on this branch: 0/3 logged nonsequential page numbers
  • TestLTXBehavior_NoExcessiveSnapshots rerun 3x on origin/main: 1/3 logged nonsequential page numbers
  • TestLTXBehavior/high-volume rerun 3x on this branch: 1/3 logged nonsequential page numbers
  • TestLTXBehavior/high-volume rerun 3x on origin/main: 0/3 logged nonsequential page numbers

Interpretation:

  • I do not have evidence that phase 1 clearly regresses behavior relative to main
  • I also do not think the system is fully clean: the intermittent nonsequential page numbers in snapshot transaction compaction error exists on both code lines
  • Because that error reproduces on main too, I did not turn it into a new hard failing soak assertion in this PR. That would fail baseline and would not tell us whether phase 1 itself is acceptable
  • Instead, I added the deterministic state-seam test for the actual refactor so we have coverage that verifyAndSync() no longer mutates DB state implicitly

My read at this point:

  • If the goal of this PR is only Ben's requested first increment, I think the code change is still reviewable on its own merits
  • If the intermittent compaction issue needs to block even small refactors, it should probably be addressed as a separate bug first or in parallel, because it is not unique to this branch

I wanted this written down here so the merge decision can be based on actual branch-vs-main comparison instead of a single flaky soak run.

@benbjohnson
Copy link
Copy Markdown
Owner

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@corylanou corylanou force-pushed the issue-1221-syncresult-phase1 branch from 7272b64 to 2c90fb0 Compare April 15, 2026 17:49
@github-actions github-actions Bot added metrics: vulns-found govulncheck found vulnerabilities metrics: go-update Go toolchain has a newer patch release labels Apr 15, 2026
@corylanou corylanou merged commit 6b8bb01 into main Apr 15, 2026
25 checks passed
@corylanou corylanou deleted the issue-1221-syncresult-phase1 branch April 15, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

metrics: go-update Go toolchain has a newer patch release metrics: vulns-found govulncheck found vulnerabilities ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants