fix(fs): defensive cancellation guards in FUSE synth path#40
Merged
Conversation
When statSynthFile sees an ErrNotExist from the underlying lookup, it caches a negative entry in statCache for the full 2s TTL. If the ErrNotExist actually came from a cancelled DB query (rather than a genuine "row does not exist"), this poisons the cache and blocks subsequent reads with false ENOENT. The three FUSE_INTERRUPT decoupling commits (4c7b4c1, b5cf146, df7616c) prevent kernel-side cancellation from reaching here on any FUSE entry path today. This commit is defense in depth: detect cancellation at the moment of the cache write and skip it, so the bug class is self- correcting if a future code path forgets to wrap (e.g. a new internal Operations caller, the NFS adapter's 30s timeout firing mid-query). Adds isCancellationError(ctx, fsErr) in errors.go. Inspects both fsErr.Cause (errors.Is against context.Canceled/DeadlineExceeded) AND ctx.Err() -- the latter catches the case where an intermediate layer (e.g. resolveSynthPath today) discarded the underlying cause before wrapping as ErrNotExist. Adds 7-sub-test table in errors_test.go covering nil/non-nil errors, direct and wrapped causes, both cancellation flavors, and the ctx-only fallback path.
Previously, resolveSynthPath swallowed any error from db.ResolvePath
and returned ("", false), making a real DB failure indistinguishable
from a genuine "path doesn't exist". Callers wrapped the false return
as ErrNotExist, dropping the underlying cause -- which meant context
cancellation, connection failures, and other transient errors
surfaced as ENOENT and (until the defensive setNegative guard in
1178aa4) poisoned the cache for the full statCacheTTL.
Changes resolveSynthPath's signature from (string, bool) to
(string, bool, error) and propagates the wrapped error. Each caller
now distinguishes:
* (id, true, nil) -- path resolved
* ("", false, nil) -- path genuinely doesn't exist (clean ENOENT)
* ("", false, err) -- DB call failed (returned as ErrIO with Cause)
Updated 8 production call sites (synth_ops.go x7, history.go x1) and
12 unit-test call sites. Fixed TestSynth_ResolveSynthRow_DBError --
it had been asserting the old swallowing behavior (ErrNotExist on
"connection refused"); now correctly asserts ErrIO with the original
error preserved in Cause.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Skip setNegative on transient cancellation (
1178aa4): newisCancellationError(ctx, fsErr)helper infs/errors.go; guards the onlystatCache.setNegativecall site insynth_ops.go. Backstop for theFUSE_INTERRUPT bug class even after the three prior decoupling commits (
4c7b4c1,b5cf146,df7616c).Propagate ResolvePath errors (
6cd9e0b):resolveSynthPathsignature changed from(string, bool)to(string, bool, error). DB errors now surface asErrIOwithCausepreserved instead of being swallowed intoErrNotExist. 7 callers insynth_ops.go, 1 inhistory.go, 12 test callers updated.Validated by 22,000 clean stress iterations across four configurations (mix of docker-FUSE and native NFS,
large+many files).