Skip to content

[BUGFIX] Ingester: fix inflight query counter leak on resource rejection#7539

Open
sandy2008 wants to merge 2 commits into
cortexproject:masterfrom
sandy2008:codex/fix-ingester-inflight-query-leak
Open

[BUGFIX] Ingester: fix inflight query counter leak on resource rejection#7539
sandy2008 wants to merge 2 commits into
cortexproject:masterfrom
sandy2008:codex/fix-ingester-inflight-query-leak

Conversation

@sandy2008
Copy link
Copy Markdown
Contributor

@sandy2008 sandy2008 commented May 22, 2026

What this PR does

Ingester.trackInflightQueryRequest incremented and tracked inflightQueryRequests before calling resourceBasedLimiter.AcceptNewRequest(). If the resource limiter rejected the query, the helper returned limiter.ErrResourceLimitReached without returning a cleanup callback or decrementing the counter.

Symptom: each resource-rejected query permanently consumed one ingester inflight-query slot. Once enough rejections accumulated to reach MaxInflightQueryRequests, later queries could be rejected with errTooManyInflightQueryRequests even after resource pressure dropped. The counter only recovered when the ingester restarted.

Fix: run the resource-based admission check before incrementing and tracking the inflight query counter. Resource-rejected queries are not admitted as inflight work, so they no longer consume max-inflight capacity or inflate the max inflight metric. The regression test now enables MaxInflightQueryRequests, verifies the counter remains zero after resource rejection, and verifies a later query succeeds once the resource limiter is disabled.

Which issue(s) this PR fixes

Fixes #7538

Checklist

  • CHANGELOG.md updated — [BUGFIX] Ingester: entry under master / unreleased.
  • Documentation updated — not applicable, no flags or config changed (make doc not required).
  • Tests added — Test_Ingester_Query_ResourceThresholdBreached now covers the leaked inflight-query counter path.

Test plan

  • git diff --check — clean
  • go test -timeout 2400s -tags "netgo slicelabels" ./pkg/ingester -run '^Test_Ingester_Query_ResourceThresholdBreached$' -count=1 — PASS
  • go test -timeout 2400s -tags "netgo slicelabels" ./pkg/ingester — PASS
  • go test -timeout 2400s -tags "netgo slicelabels" ./pkg/compactor -run '^TestPartitionCompactor_DeleteLocalSyncFiles$' -count=1 — PASS locally after unrelated CI flake in test-no-race (amd64)

@sandy2008 sandy2008 changed the title [codex] Fix ingester inflight query counter leak [BUGFIX] Fix ingester inflight query counter leak May 22, 2026
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@sandy2008 sandy2008 force-pushed the codex/fix-ingester-inflight-query-leak branch from 4cb0968 to 92f7156 Compare May 22, 2026 04:52
@sandy2008 sandy2008 changed the title [BUGFIX] Fix ingester inflight query counter leak [BUGFIX] Ingester: fix inflight query counter leak on resource rejection May 22, 2026
@sandy2008 sandy2008 marked this pull request as ready for review May 22, 2026 04:55
Comment thread pkg/ingester/ingester_test.go Outdated
- Add clarifying comment before the post-rejection QueryStream block,
  per inline review nit.
- Replace `i.resourceBasedLimiter = nil` with mutating the underlying
  mock monitor's heap utilization, so the same live limiter admits the
  second query. The regression test now exercises the limiter machinery
  on the retry instead of bypassing it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ingester lgtm This PR has been approved by a maintainer size/S type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Ingester: trackInflightQueryRequest leaks inflight counter on resource-based rejection — eventually rejects all queries until restart

2 participants