Skip to content

Fix N+1 in GraphStoreMetrics caller lookups (CA-171)#44

Merged
jstuart0 merged 3 commits into
mainfrom
fix/ca-171-graphmetrics-n-plus-one
May 7, 2026
Merged

Fix N+1 in GraphStoreMetrics caller lookups (CA-171)#44
jstuart0 merged 3 commits into
mainfrom
fix/ca-171-graphmetrics-n-plus-one

Conversation

@jstuart0
Copy link
Copy Markdown
Collaborator

@jstuart0 jstuart0 commented May 7, 2026

Summary

Living Wiki page generation hung indefinitely on any non-trivial repo because GraphStoreMetrics.repoReferenceCount / packageReferenceCount issued one SurrealDB websocket round-trip per caller via GetSymbol(callerID) inside a nested loop — O(symbols × callers) sequential RPCs. Replaces the inner N with one GetSymbolsByIDs batch.

Discovered via pprof goroutine dump during CA-169 deploy validation:

  • Two goroutines select-blocked in gorillaws.Connection.Call
  • Errgroup dispatch goroutine chan send-blocked on the orchestrator semaphore (clamped to 1 by the upstream-LLM capacity provider — that part working as designed)
  • Last 3 LW jobs all reaped as failed

The outer GetCallers-per-symbol N+1 remains; that fix needs a new GetCallersByIDs store method and is tracked separately.

Closes CA-171.

Test plan

  • go test ./internal/livingwiki/orchestrator/ -count=1 green
  • go build ./... clean
  • Manual: Living Wiki generation completes pages 1-N to completion (vs. prior indefinite stall)

Also in this PR (separate commit)

chore(api): expose net/http/pprof behind SOURCEBRIDGE_PPROF_ENABLED — dev-only goroutine dump endpoint that produced the diagnosis. Off by default; opt-in via env var.

🤖 Generated with Claude Code

jstuart0 and others added 3 commits May 6, 2026 21:40
repoReferenceCount and packageReferenceCount issued one SurrealDB websocket
round-trip per caller via GetSymbol(callerID) inside a nested loop. On any
non-trivial repo this stalled Living Wiki page generation indefinitely —
the page goroutine ground through the per-call websocket queue while the
errgroup semaphore (clamped to MaxConcurrency=1 by the upstream LLM
capacity provider) blocked every subsequent page.

Replaces the inner GetSymbol-per-callerID with a single GetSymbolsByIDs
batch fetch (helper already exists in internal/db/store.go:1875).
The outer GetCallers-per-symbol N+1 remains and is tracked separately;
that fix needs a new GetCallersByIDs store method.

Caught via pprof goroutine dump (gorillaws.Connection.Call [select])
during CA-169 deploy validation.

Refs CA-171.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds dev-only Go pprof endpoints under /debug/pprof/* gated by an env
var (default false) so a goroutine dump can be captured against a hung
job without rebuilding. Mounted before the rate limiter so a dump is
not throttled. Compose forwards the env var so SOURCEBRIDGE_PPROF_ENABLED=true
on the host enables it for local stacks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lEdges (CA-171)

Initial fix only batched the inner GetSymbol-per-callerID lookup. The outer
GetCallers-per-symbol N+1 still stalled the same way (confirmed by a second
pprof goroutine dump after the first deploy).

GraphStore already exposes GetCallEdges(repoID) which returns every
caller→callee edge for the repo in a single query. Rewrites all four
metric functions to use it:

- packageReferenceCount: filter edges by callee membership in pkg, then
  one GetSymbolsByIDs batch for callers.
- packageRelationCount:  count edges whose callee is in pkg.
- repoReferenceCount:    one GetCallEdges + one GetSymbolsByIDs batch.
- repoRelationCount:     len(GetCallEdges(repoID)).

For a repo with N symbols and K avg callers per symbol, this collapses
O(N*K + 1) sequential SurrealDB round-trips into 2-3 total.

Refs CA-171.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jstuart0 jstuart0 merged commit 56246ea into main May 7, 2026
13 checks passed
@jstuart0 jstuart0 deleted the fix/ca-171-graphmetrics-n-plus-one branch May 7, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant