fix(integration): skip or vector(...) in fuzz query generator#7551
Open
sandy2008 wants to merge 1 commit into
Open
fix(integration): skip or vector(...) in fuzz query generator#7551sandy2008 wants to merge 1 commit into
or vector(...) in fuzz query generator#7551sandy2008 wants to merge 1 commit into
Conversation
TestVerticalShardingFuzz flakes when promqlsmith emits an expression containing `<lhs> or vector(<rhs>)` against block data with partial time coverage on the LHS: Cortex's vertically-sharded query path and the unsharded reference engine disagree on which timestamps trigger the `vector(...)` fallback, because each shard decides locally whether the LHS is empty at a given step while the unsharded path sees the union. Both answers are individually correct under their own model, but the final results diverge on a handful of points. Same analyser-side family as the known `absent` / `absent_over_time` / `scalar` cases (cortexproject#5203, cortexproject#5204, cortexproject#5205): the Thanos PromQL analyser does not classify `or vector(...)` as non-shardable. Full empirical census in cortexproject#7547. Add hasOrVectorFallback(parser.Expr), an AST walker that uses parser.Inspect to detect any *parser.BinaryExpr with Op == parser.LOR whose RHS is a *parser.Call to the `vector` function. Call it from isValidQuery alongside the existing `limitk` / `limit_ratio` / `--` filters so every fuzz caller (TestVerticalShardingFuzz plus the seven other fuzz tests that go through runQueryFuzzTestCases) skips the offending shape. An earlier draft used `strings.Contains(queryStr, "vector(")`. Round-2 review flagged that as too broad: a substring scan fires on bare `vector(0)`, `sum(vector(1))`, `vector(1) + on() vector(2)`, and even on incidental `vector(` text inside label literals — none of which diverge between the sharded and unsharded engines. The AST predicate matches only `<lhs> or vector(<rhs>)`, leaving the rest of the random surface intact. Add TestHasOrVectorFallback with six subtests pinning the predicate: `up` (no), `vector(1)` (no — bare call is fine), `up or up` (no — `or` without `vector`), `up or vector(1)` (yes), a deep-LHS positive (`(sum(rate(up[1m])) == bool 0) or vector(0)`), and a simplified form of the actual failing query from cortexproject#7547. All six pass locally. This is a test-side dodge, not a sharding-engine fix. The proper upstream change — teaching the Thanos analyser to recognise `or vector(...)` as non-shardable, mirroring the existing `absent` / `scalar` exclusions — is tracked separately. Trade-off: isValidQuery is global to all fuzz callers, so random fuzz coverage of `or vector(...)` is dropped everywhere, not just in TestVerticalShardingFuzz. Accepted for the reasons in cortexproject#7547: the empirical signal points at the sharded vs unsharded path as the only current divergence context, and the filter becomes moot the moment the upstream analyser is fixed. Fixes cortexproject#7547 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
10 tasks
sandy2008
added a commit
to sandy2008/cortex-1
that referenced
this pull request
May 22, 2026
Failing TestVerticalShardingFuzz on the prior CI run was the same pre-existing flake tracked in cortexproject#7547 / fixed by cortexproject#7551 — not a regression from this PR's error-comparator change. Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Stabilises the flaky
TestVerticalShardingFuzzintegration test by teaching thepromqlsmith-driven fuzz harness inintegration/query_fuzz_test.goto reject any randomly-generated expression whose AST contains a<lhs> or vector(<rhs>)sub-expression. A newhasOrVectorFallback(parser.Expr) boolpredicate walks the parsed expression withparser.Inspectand matches exactly that shape; it is called from the sharedisValidQuerygate. A small unit testTestHasOrVectorFallback(6 subtests, all pass locally) pins the predicate's behaviour. This is a test-side dodge: it removes a known false-positive shape from the random surface, it does not fix the underlying sharding semantic.Root cause
The full empirical census (occurrences, environments, exact failing query and log) is in issue #7547. In short: when the generator emits an expression like
against block data that has partial time coverage for the LHS series, Cortex's vertically-sharded query path and the unsharded reference engine disagree on which timestamps trigger the
vector(...)fallback. Each shard sees its own slice of series coverage and decides locally whether the LHS is empty at a given step; the unsharded path sees the union. The two engines then materialise the constant-vector fallback at different sample timestamps, so the final result diverges on a handful of points even though both answers are individually correct under their own model. This is the same class of analyser-side problem already known and tracked forabsent/absent_over_time/scalar(see #5203, #5204, #5205): the Thanos PromQL analyser does not classify these "fallback-to-constant-when-empty" shapes as non-shardable, so the planner shards them when it should not.The proper fix is upstream: the Thanos analyser needs to recognise
or vector(...)as non-shardable, in the same family as the existingabsent/scalarexclusions. That work is tracked separately and is out of scope for this PR.Fix
One new AST-walker helper, one unit test, and one call site in
integration/query_fuzz_test.go:hasOrVectorFallback(expr parser.Expr) bool— usesparser.Inspectto walk the expression tree and return true if any*parser.BinaryExprwithOp == parser.LORhas a*parser.Callon its RHS whoseFunc.Nameis"vector". Brace / paren depth and surrounding context don't matter — the AST gives an exact structural match, so it does not over-fire on incidental substrings (e.g. a label value of"vector("or a top-level barevector(0)).TestHasOrVectorFallback— six subtests pinning the predicate:up(no),vector(1)(no, bare call is fine),up or up(no,orwithoutvector),up or vector(1)(yes),(sum(rate(up[1m])) == bool 0) or vector(0)(yes, deep LHS), and a simplified form of the actual failing query from Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547 ((sum without (job) (stddev_over_time(up[4m]) == -up)) or vector(2.0099)— yes). All 6 pass locally.isValidQuery, aif hasOrVectorFallback(generatedQuery) { return false }early-return is added next to the existing"limitk"/"limit_ratio"/"--"filters that run for every caller.isValidQueryis the shared gate used by all eight fuzz tests in this file (TestVerticalShardingFuzz,TestRW1vsRW2QueryFuzz,TestExpandedPostingsCacheFuzz,TestBackwardCompatibilityQueryFuzz,TestPrometheusCompatibilityQueryFuzz,TestNativeHistogramFuzz,TestDisableChunkTrimmingFuzz,TestProtobufCodecFuzz,TestExperimentalPromQLFuncsWithPrometheus), invoked viarunQueryFuzzTestCasesand the two direct call sites atintegration/query_fuzz_test.go:387and:554. The new filter therefore applies to every fuzz caller, not justTestVerticalShardingFuzz— see "Trade-off" below.Why an AST predicate, not a substring filter
An earlier draft of this PR used
strings.Contains(queryStr, "vector(")to flag the offending shape. Round-2 review flagged that as too broad and the implementation was switched to the AST walker. The reasoning, made explicit so it survives future edits:vector(— including the perfectly-shardable barevector(0)at the top level,sum(vector(1)),vector(1) + on() vector(2), etc. Only<lhs> or vector(<rhs>)actually diverges between the sharded and unsharded engines; the substring filter drops orders of magnitude more random surface than necessary and silently shrinks coverage of legitimatevector(...)shapes."vector("in{x="vector("}), whichpromqlsmithcan in principle produce. The AST predicate is immune by construction.parser.Inspect, oneBinaryExpr/LORcheck, oneCall/vectorcheck. TheTestHasOrVectorFallbacksubtests demonstrate it matches exactly the intended shape and nothing else. This is the same level of rigour the surrounding filters maintain.Net: the AST predicate drops only the genuinely-problematic shape and leaves bare
vector(...)calls in the random surface, while remaining as cheap to maintain as a substring scan.Why not other approaches
or vector(...)is classified as non-shardable. This is the right long-term fix and the issue notes it is tracked separately. It is a vendor-tree change that would have to be re-applied on every Thanos / Prometheus bump, and it is materially larger than a test-side filter. Doing it here would conflate "stop the flake" with "change sharding behaviour"; better to land them as separate, reviewable changes.TestVerticalShardingFuzzoutright. Removes the signal entirely. The test still has value for the rest of the fuzz surface (binary ops, aggregations, range vector functions, label matchers); only theor vector(...)shape is currently a known false positive.TestVerticalShardingFuzz). Duplicates plumbing for a shape that will become moot the moment the Thanos analyser is fixed; see "Trade-off" for why a global filter is accepted here.Trade-off
isValidQueryis global — the new filter applies to every fuzz test that calls it, not justTestVerticalShardingFuzz. The biggest collateral isTestRW1vsRW2QueryFuzz, which exercises two isolated Cortex instances against the same data and does not involve sharding; anor vector(...)expression there would in principle still be a valid divergence-detection target. The same is true to a lesser degree forTestBackwardCompatibilityQueryFuzz,TestPrometheusCompatibilityQueryFuzz, and the other six fuzz callers.This is accepted for the reasons spelled out in issue #7547: (a) the empirical evidence points at the sharded vs unsharded path as the only context where this shape currently diverges; (b) the next-most-likely caller,
TestRW1vsRW2QueryFuzz, runs two identically-configured Cortex instances and is unlikely to surface a new divergence on this shape without the upstream analyser already disagreeing; (c) random fuzz coverage ofor vector(...)is the only thing dropped — deterministic dedicated tests for the fallback behaviour can be added later if wanted; and (d) the alternative (a per-caller filter list) duplicates plumbing for a shape that will become moot the moment the Thanos analyser is fixed.Which issue(s) this PR fixes
Fixes #7547
Checklist
CHANGELOG.mdupdated — not applicable; test-only change with no user-facing behaviour change.TestHasOrVectorFallback(6 subtests) added next to the new predicate, covering the no-op shapes (up,vector(1),up or up), the basic positive (up or vector(1)), a deep-LHS positive, and a simplified form of the actual failing query from Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547. All 6 pass locally.Test plan
Local (no Docker harness required):
gofmt -l integration/query_fuzz_test.go— cleangoimports -local github.com/cortexproject/cortex -l integration/query_fuzz_test.go— cleango vet -tags "netgo slicelabels integration integration_query_fuzz" ./integration/...— cleango build -tags "netgo slicelabels integration integration_query_fuzz" ./integration/...— cleango test -tags "netgo slicelabels integration integration_query_fuzz" -run "^TestHasOrVectorFallback$" ./integration/...— PASS (6/6 subtests)Validation that the flake is gone (requires Docker):
make ./cmd/cortex/.uptodatego test -v -tags=integration,requires_docker,integration_query_fuzz -timeout 2400s -count=10 ./integration/... -run "^TestVerticalShardingFuzz$"— expect 10/10 PASS. If new failures surface on other fuzz tests (TestRW1vsRW2QueryFuzzis the most likely) they should be triaged as real divergences, not regressions from this PR.