Skip to content

fix: resolve embedding and clustering bugs for large repos#6

Merged
thuongh2 merged 2 commits intomainfrom
fix/embedding-and-clustering-bugs
Apr 2, 2026
Merged

fix: resolve embedding and clustering bugs for large repos#6
thuongh2 merged 2 commits intomainfrom
fix/embedding-and-clustering-bugs

Conversation

@thuongh2
Copy link
Copy Markdown
Owner

@thuongh2 thuongh2 commented Apr 2, 2026

Summary

This PR fixes three critical bugs that affect large-scale repositories (100k+ symbols):

1. Embedding Worker Graceful Shutdown (#3)

Problem: The embedding worker started in a goroutine but the channel was never closed and analyze never waited for completion. The process exited before the worker could process even a single batch, leaving embeddings at 0%.

Fix:

  • Added Close() method to Worker that closes the queue channel and waits via sync.WaitGroup
  • analyze.go now calls w.Close() after enqueueing all jobs, blocking until embeddings complete

2. Embedding Queue Capacity (#4)

Problem: Queue buffer was only 1,000 jobs with silent drop on full. Repos with >1,000 symbols had embeddings silently dropped (e.g., 133k symbols → only 1,000 embedded).

Fix:

  • Increased buffer from 1,000 to 100,000 jobs
  • Added warning log when jobs are dropped: log.Printf("embedder: job dropped for %s (queue full)", job.UID)

3. SetClusterForNodes SQLite Variable Limit (#5)

Problem: SetClusterForNodes built a single WHERE uid IN (?, ?, ...) query with one placeholder per UID. SQLite has a ~999 variable limit, causing "too many SQL variables" error on clusters with >998 members.

Fix:

  • Batch UIDs into chunks of 500
  • Execute multiple UPDATE queries within the same transaction
  • Follows the same pattern as UpsertClusterMembers

Testing

All existing tests pass:

$ go test ./...
ok    github.com/thuongh2/git-mimir/internal/cluster
ok    github.com/thuongh2/git-mimir/internal/daemon
ok    github.com/thuongh2/git-mimir/internal/incremental
ok    github.com/thuongh2/git-mimir/internal/parser
ok    github.com/thuongh2/git-mimir/internal/process
ok    github.com/thuongh2/git-mimir/internal/resolver
ok    github.com/thuongh2/git-mimir/internal/setup
ok    github.com/thuongh2/git-mimir/internal/store

Impact

  • Repos with 100k+ symbols can now complete mimir analyze without errors
  • All embeddings are generated before analyze exits
  • No more silent job drops or SQLite variable limit crashes

Three critical fixes for large-scale repositories:

1. Embedding worker graceful shutdown (internal/embedder/worker.go, cmd/mimir/analyze.go)
   - Added Close() method with WaitGroup to block until embeddings complete
   - Fixes issue where analyze exits before worker processes any batches
   - All embeddings now complete before analyze command exits

2. Embedding queue capacity increased (internal/embedder/worker.go)
   - Increased buffer from 1,000 to 100,000 jobs
   - Added warning log when jobs are dropped (queue full)
   - Handles repos with 100k+ symbols without silent drops

3. SetClusterForNodes SQLite variable limit fix (internal/store/clusters.go)
   - Batch UIDs into chunks of 500 to avoid SQLite ~999 variable limit
   - Executes multiple UPDATE queries within same transaction
   - Prevents "too many SQL variables" error on large clusters

All existing tests pass.

Fixes: #3, #4, #5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants