fix(integration): skip `or vector(...)` in fuzz query generator by sandy2008 · Pull Request #7551 · cortexproject/cortex

sandy2008 · 2026-05-22T10:07:45Z

What this PR does

Stabilises the flaky TestVerticalShardingFuzz integration test by teaching the promqlsmith-driven fuzz harness in integration/query_fuzz_test.go to reject any randomly-generated expression whose AST contains a <lhs> or vector(<rhs>) sub-expression. A new hasOrVectorFallback(parser.Expr) bool predicate walks the parsed expression with parser.Inspect and matches exactly that shape; it is called from the shared isValidQuery gate. A small unit test TestHasOrVectorFallback (6 subtests, all pass locally) pins the predicate's behaviour. This is a test-side dodge: it removes a known false-positive shape from the random surface, it does not fix the underlying sharding semantic.

Root cause

The full empirical census (occurrences, environments, exact failing query and log) is in issue #7547. In short: when the generator emits an expression like

({a} or vector(0)) <op> ({b} or vector(0))

against block data that has partial time coverage for the LHS series, Cortex's vertically-sharded query path and the unsharded reference engine disagree on which timestamps trigger the vector(...) fallback. Each shard sees its own slice of series coverage and decides locally whether the LHS is empty at a given step; the unsharded path sees the union. The two engines then materialise the constant-vector fallback at different sample timestamps, so the final result diverges on a handful of points even though both answers are individually correct under their own model. This is the same class of analyser-side problem already known and tracked for absent / absent_over_time / scalar (see #5203, #5204, #5205): the Thanos PromQL analyser does not classify these "fallback-to-constant-when-empty" shapes as non-shardable, so the planner shards them when it should not.

The proper fix is upstream: the Thanos analyser needs to recognise or vector(...) as non-shardable, in the same family as the existing absent / scalar exclusions. That work is tracked separately and is out of scope for this PR.

Fix

One new AST-walker helper, one unit test, and one call site in integration/query_fuzz_test.go:

hasOrVectorFallback(expr parser.Expr) bool — uses parser.Inspect to walk the expression tree and return true if any *parser.BinaryExpr with Op == parser.LOR has a *parser.Call on its RHS whose Func.Name is "vector". Brace / paren depth and surrounding context don't matter — the AST gives an exact structural match, so it does not over-fire on incidental substrings (e.g. a label value of "vector(" or a top-level bare vector(0)).
TestHasOrVectorFallback — six subtests pinning the predicate: up (no), vector(1) (no, bare call is fine), up or up (no, or without vector), up or vector(1) (yes), (sum(rate(up[1m])) == bool 0) or vector(0) (yes, deep LHS), and a simplified form of the actual failing query from Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547 ((sum without (job) (stddev_over_time(up[4m]) == -up)) or vector(2.0099) — yes). All 6 pass locally.
Inside isValidQuery, a if hasOrVectorFallback(generatedQuery) { return false } early-return is added next to the existing "limitk" / "limit_ratio" / "--" filters that run for every caller.

isValidQuery is the shared gate used by all eight fuzz tests in this file (TestVerticalShardingFuzz, TestRW1vsRW2QueryFuzz, TestExpandedPostingsCacheFuzz, TestBackwardCompatibilityQueryFuzz, TestPrometheusCompatibilityQueryFuzz, TestNativeHistogramFuzz, TestDisableChunkTrimmingFuzz, TestProtobufCodecFuzz, TestExperimentalPromQLFuncsWithPrometheus), invoked via runQueryFuzzTestCases and the two direct call sites at integration/query_fuzz_test.go:387 and :554. The new filter therefore applies to every fuzz caller, not just TestVerticalShardingFuzz — see "Trade-off" below.

Why an AST predicate, not a substring filter

An earlier draft of this PR used strings.Contains(queryStr, "vector(") to flag the offending shape. Round-2 review flagged that as too broad and the implementation was switched to the AST walker. The reasoning, made explicit so it survives future edits:

A substring scan fires on every occurrence of vector( — including the perfectly-shardable bare vector(0) at the top level, sum(vector(1)), vector(1) + on() vector(2), etc. Only <lhs> or vector(<rhs>) actually diverges between the sharded and unsharded engines; the substring filter drops orders of magnitude more random surface than necessary and silently shrinks coverage of legitimate vector(...) shapes.
A substring scan also false-positives on incidental text in label literals (e.g. a label value of "vector(" in {x="vector("}), which promqlsmith can in principle produce. The AST predicate is immune by construction.
The AST walker is ~15 lines of straightforward code: one parser.Inspect, one BinaryExpr/LOR check, one Call/vector check. The TestHasOrVectorFallback subtests demonstrate it matches exactly the intended shape and nothing else. This is the same level of rigour the surrounding filters maintain.

Net: the AST predicate drops only the genuinely-problematic shape and leaves bare vector(...) calls in the random surface, while remaining as cheap to maintain as a substring scan.

Why not other approaches

Add a real fix in the Thanos PromQL analyser so or vector(...) is classified as non-shardable. This is the right long-term fix and the issue notes it is tracked separately. It is a vendor-tree change that would have to be re-applied on every Thanos / Prometheus bump, and it is materially larger than a test-side filter. Doing it here would conflate "stop the flake" with "change sharding behaviour"; better to land them as separate, reviewable changes.
Skip / disable TestVerticalShardingFuzz outright. Removes the signal entirely. The test still has value for the rest of the fuzz surface (binary ops, aggregations, range vector functions, label matchers); only the or vector(...) shape is currently a known false positive.
Loosen the result comparator to tolerate timestamp-shifted vector-fallback divergences. Either masks real bugs or has to encode the sharding semantic into the comparator, which is exactly the analyser-side complexity we want to keep out of the test harness.
Per-caller filter list (apply the predicate only inside TestVerticalShardingFuzz). Duplicates plumbing for a shape that will become moot the moment the Thanos analyser is fixed; see "Trade-off" for why a global filter is accepted here.

Trade-off

isValidQuery is global — the new filter applies to every fuzz test that calls it, not just TestVerticalShardingFuzz. The biggest collateral is TestRW1vsRW2QueryFuzz, which exercises two isolated Cortex instances against the same data and does not involve sharding; an or vector(...) expression there would in principle still be a valid divergence-detection target. The same is true to a lesser degree for TestBackwardCompatibilityQueryFuzz, TestPrometheusCompatibilityQueryFuzz, and the other six fuzz callers.

This is accepted for the reasons spelled out in issue #7547: (a) the empirical evidence points at the sharded vs unsharded path as the only context where this shape currently diverges; (b) the next-most-likely caller, TestRW1vsRW2QueryFuzz, runs two identically-configured Cortex instances and is unlikely to surface a new divergence on this shape without the upstream analyser already disagreeing; (c) random fuzz coverage of or vector(...) is the only thing dropped — deterministic dedicated tests for the fallback behaviour can be added later if wanted; and (d) the alternative (a per-caller filter list) duplicates plumbing for a shape that will become moot the moment the Thanos analyser is fixed.

Which issue(s) this PR fixes

Fixes #7547

Checklist

CHANGELOG.md updated — not applicable; test-only change with no user-facing behaviour change.
Documentation updated — not applicable; no flags or config changed.
Tests: TestHasOrVectorFallback (6 subtests) added next to the new predicate, covering the no-op shapes (up, vector(1), up or up), the basic positive (up or vector(1)), a deep-LHS positive, and a simplified form of the actual failing query from Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547. All 6 pass locally.

Test plan

Local (no Docker harness required):

gofmt -l integration/query_fuzz_test.go — clean
goimports -local github.com/cortexproject/cortex -l integration/query_fuzz_test.go — clean
go vet -tags "netgo slicelabels integration integration_query_fuzz" ./integration/... — clean
go build -tags "netgo slicelabels integration integration_query_fuzz" ./integration/... — clean
go test -tags "netgo slicelabels integration integration_query_fuzz" -run "^TestHasOrVectorFallback$" ./integration/... — PASS (6/6 subtests)

Validation that the flake is gone (requires Docker):

make ./cmd/cortex/.uptodate
go test -v -tags=integration,requires_docker,integration_query_fuzz -timeout 2400s -count=10 ./integration/... -run "^TestVerticalShardingFuzz$" — expect 10/10 PASS. If new failures surface on other fuzz tests (TestRW1vsRW2QueryFuzz is the most likely) they should be triaged as real divergences, not regressions from this PR.

TestVerticalShardingFuzz flakes when promqlsmith emits an expression containing `<lhs> or vector(<rhs>)` against block data with partial time coverage on the LHS: Cortex's vertically-sharded query path and the unsharded reference engine disagree on which timestamps trigger the `vector(...)` fallback, because each shard decides locally whether the LHS is empty at a given step while the unsharded path sees the union. Both answers are individually correct under their own model, but the final results diverge on a handful of points. Same analyser-side family as the known `absent` / `absent_over_time` / `scalar` cases (cortexproject#5203, cortexproject#5204, cortexproject#5205): the Thanos PromQL analyser does not classify `or vector(...)` as non-shardable. Full empirical census in cortexproject#7547. Add hasOrVectorFallback(parser.Expr), an AST walker that uses parser.Inspect to detect any *parser.BinaryExpr with Op == parser.LOR whose RHS is a *parser.Call to the `vector` function. Call it from isValidQuery alongside the existing `limitk` / `limit_ratio` / `--` filters so every fuzz caller (TestVerticalShardingFuzz plus the seven other fuzz tests that go through runQueryFuzzTestCases) skips the offending shape. An earlier draft used `strings.Contains(queryStr, "vector(")`. Round-2 review flagged that as too broad: a substring scan fires on bare `vector(0)`, `sum(vector(1))`, `vector(1) + on() vector(2)`, and even on incidental `vector(` text inside label literals — none of which diverge between the sharded and unsharded engines. The AST predicate matches only `<lhs> or vector(<rhs>)`, leaving the rest of the random surface intact. Add TestHasOrVectorFallback with six subtests pinning the predicate: `up` (no), `vector(1)` (no — bare call is fine), `up or up` (no — `or` without `vector`), `up or vector(1)` (yes), a deep-LHS positive (`(sum(rate(up[1m])) == bool 0) or vector(0)`), and a simplified form of the actual failing query from cortexproject#7547. All six pass locally. This is a test-side dodge, not a sharding-engine fix. The proper upstream change — teaching the Thanos analyser to recognise `or vector(...)` as non-shardable, mirroring the existing `absent` / `scalar` exclusions — is tracked separately. Trade-off: isValidQuery is global to all fuzz callers, so random fuzz coverage of `or vector(...)` is dropped everywhere, not just in TestVerticalShardingFuzz. Accepted for the reasons in cortexproject#7547: the empirical signal points at the sharded vs unsharded path as the only current divergence context, and the filter becomes moot the moment the upstream analyser is fixed. Fixes cortexproject#7547 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>

Failing TestVerticalShardingFuzz on the prior CI run was the same pre-existing flake tracked in cortexproject#7547 / fixed by cortexproject#7551 — not a regression from this PR's error-comparator change. Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>

pull-request-size Bot added the size/M label May 22, 2026

dosubot Bot added the type/flaky-test label May 22, 2026

sandy2008 mentioned this pull request May 22, 2026

fix(integration): compare query fuzz errors by class, not exact string #7550

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(integration): skip `or vector(...)` in fuzz query generator#7551

fix(integration): skip `or vector(...)` in fuzz query generator#7551
sandy2008 wants to merge 1 commit into
cortexproject:masterfrom
sandy2008:fix-vertical-sharding-fuzz-or-vector

sandy2008 commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sandy2008 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Root cause

Fix

Why an AST predicate, not a substring filter

Why not other approaches

Trade-off

Which issue(s) this PR fixes

Checklist

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sandy2008 commented May 22, 2026 •

edited

Loading