⚡ Bolt: Fast search tracking reset using O(K) clearing#415
⚡ Bolt: Fast search tracking reset using O(K) clearing#415AhmmedSamier wants to merge 1 commit intomasterfrom
Conversation
Co-authored-by: AhmmedSamier <17784876+AhmmedSamier@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughThis PR optimizes the search engine's deduplication logic by replacing Set-based tracking with a reusable Uint8Array-backed visited-indices tracker. It introduces a two-phase search strategy prioritizing previously matched indices before scanning remaining items, adding getRecentItems() API, and includes benchmark results and implementation documentation. ChangesFast Search Tracking Reset Optimization
Sequence DiagramsequenceDiagram
participant Client
participant Engine as SearchEngine
participant PrefIdx as Preferred<br/>Indices
participant Tracker as Visited<br/>Tracker
participant Search as Search<br/>Pass
Client->>Engine: performUnifiedSearch(query)
activate Engine
Engine->>PrefIdx: getPreferredIndicesForQuery(query)
activate PrefIdx
PrefIdx->>PrefIdx: memoized top results lookup
PrefIdx-->>Engine: preferred indices list
deactivate PrefIdx
alt Preferred indices available
Engine->>Search: searchWithIndices(preferred, useVisited=true)
activate Search
Search->>Tracker: populateCandidateIndices(preferred)
Tracker->>Tracker: mark indices visited
Search->>Search: search and collect matches
Search-->>Engine: matches from preferred set
deactivate Search
end
Engine->>Search: searchRemainingItems()
activate Search
Search->>Tracker: populate remaining candidates
Tracker->>Tracker: mark remaining visited
Search->>Search: search with visited dedup
Search-->>Engine: remaining matches
deactivate Search
Engine->>Tracker: reset via finally block (O(K))
Tracker->>Tracker: zero only modified indices
Engine-->>Client: combined results
deactivate Engine
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
language-server/src/core/search-engine.ts (1)
2472-2478:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd documentation to clarify that
getRecentItems()andrecordActivity()are intentional stubs for interface compliance.SearchEngine implements ISearchProvider and provides these methods as empty stubs because the actual activity tracking is handled by ActivityTracker and delegated at the server level. Without clarification, callers reading these methods cannot distinguish between "feature disabled" and "intentional no-op on aggregator engine". Add a comment explaining that these are required by the interface contract but actual activity tracking occurs externally.
📝 Suggested clarification
+ /** + * Required by ISearchProvider but activity tracking is handled by + * ActivityTracker and delegated at the server level. Aggregator engine + * does not track activity itself. + */ getRecentItems(): SearchResult[] { return []; } recordActivity(): void { - // No-op + // No-op: activity tracking is external; see ActivityTracker }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@language-server/src/core/search-engine.ts` around lines 2472 - 2478, The methods getRecentItems() and recordActivity() on SearchEngine are intentional no-op implementations to satisfy the ISearchProvider interface while activity tracking is handled by ActivityTracker at the server level; update the SearchEngine class to add concise doc comments above getRecentItems() and recordActivity() stating they are deliberate stubs for interface compliance and that activity tracking is delegated externally (mention ActivityTracker and server-level delegation), so readers understand these are not unimplemented features but by-design no-ops.
🧹 Nitpick comments (1)
language-server/src/core/search-engine.ts (1)
289-296: 💤 Low valueBuffer-content copy on resize is unnecessary work.
Between searches the visited tracking is fully reset (the
try…finallyinperformUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItemsalways lands atvisitedTrackerLength = 0with all marked entries zeroed). Since aUint8Array/Uint32Arrayis already zero-initialized, the.set(this.visitedIndicesBuffer)/.set(this.visitedIndicesTracker)calls just memcpy stale zeros into a fresh buffer. Skipping them avoids an O(N) copy on every grow, which is the exact cost the rest of this PR is trying to eliminate.♻️ Drop the redundant copies
- const newVisitedBuffer = new Uint8Array(newCapacity); - newVisitedBuffer.set(this.visitedIndicesBuffer); - this.visitedIndicesBuffer = newVisitedBuffer; - - const newVisitedTracker = new Uint32Array(newCapacity); - newVisitedTracker.set(this.visitedIndicesTracker); - this.visitedIndicesTracker = newVisitedTracker; + // Visited tracking is transient; resets to all-zero between searches. + // No need to preserve contents on grow. + this.visitedIndicesBuffer = new Uint8Array(newCapacity); + this.visitedIndicesTracker = new Uint32Array(newCapacity); + this.visitedTrackerLength = 0;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@language-server/src/core/search-engine.ts` around lines 289 - 296, The resize logic copies the old visited buffers into newly allocated zeroed arrays (newVisitedBuffer.set(this.visitedIndicesBuffer) and newVisitedTracker.set(this.visitedIndicesTracker>) even though visitedTrackerLength is always reset to 0 between searches; remove those .set(...) calls and simply replace visitedIndicesBuffer and visitedIndicesTracker with new zero-initialized Uint8Array/Uint32Array instances (created as new Uint8Array(newCapacity) and new Uint32Array(newCapacity)) in the resize branch inside the method that grows the tracker (the code handling newVisitedBuffer/newVisitedTracker), relying on the guaranteed zero-init behavior used by performUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItems so you avoid the O(N) memcpy on every grow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 1649-1669: performUnifiedSearch uses shared instance fields
visitedIndicesBuffer, visitedIndicesTracker and visitedTrackerLength and can be
corrupted by concurrent callers from async callers like performSymbolSearch and
performFileSearch; add a runWithVisitedTracking(fn) helper (or a simple
reentrancy guard) that asserts visitedTrackerLength === 0 at entry, sets up/uses
the buffers for the duration of fn, and always resets visitedTrackerLength to 0
and clears visitedIndicesBuffer entries in a finally block to prevent cross-call
contamination; update performUnifiedSearch to call runWithVisitedTracking and/or
add the guard check/throw at its start so concurrent reentry is caught during
dev-time instead of silently corrupting searches.
---
Outside diff comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2472-2478: The methods getRecentItems() and recordActivity() on
SearchEngine are intentional no-op implementations to satisfy the
ISearchProvider interface while activity tracking is handled by ActivityTracker
at the server level; update the SearchEngine class to add concise doc comments
above getRecentItems() and recordActivity() stating they are deliberate stubs
for interface compliance and that activity tracking is delegated externally
(mention ActivityTracker and server-level delegation), so readers understand
these are not unimplemented features but by-design no-ops.
---
Nitpick comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 289-296: The resize logic copies the old visited buffers into
newly allocated zeroed arrays (newVisitedBuffer.set(this.visitedIndicesBuffer)
and newVisitedTracker.set(this.visitedIndicesTracker>) even though
visitedTrackerLength is always reset to 0 between searches; remove those
.set(...) calls and simply replace visitedIndicesBuffer and
visitedIndicesTracker with new zero-initialized Uint8Array/Uint32Array instances
(created as new Uint8Array(newCapacity) and new Uint32Array(newCapacity)) in the
resize branch inside the method that grows the tracker (the code handling
newVisitedBuffer/newVisitedTracker), relying on the guaranteed zero-init
behavior used by
performUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItems so you
avoid the O(N) memcpy on every grow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0f16fbe8-418d-4338-ba8a-8845ab86dc92
📒 Files selected for processing (3)
.jules/bolt.mdlanguage-server/benchmarks/results_burst.jsonlanguage-server/src/core/search-engine.ts
| try { | ||
| // ⚡ Bolt: Fast integer ID tracking using Uint8Array instead of Set | ||
| const preferredIndices = this.getPreferredIndicesForQuery(scope, query, indices); | ||
| const useVisited = preferredIndices.length > 0; | ||
|
|
||
| if (indices) { | ||
| this.searchWithIndices(indices, searchContext, heap, token, visited); | ||
| } else { | ||
| this.searchAllItems(searchContext, heap, token, visited); | ||
| if (preferredIndices.length > 0) { | ||
| this.searchWithIndices(preferredIndices, searchContext, heap, token, useVisited); | ||
| } | ||
|
|
||
| if (indices) { | ||
| this.searchWithIndices(indices, searchContext, heap, token, useVisited); | ||
| } else { | ||
| this.searchAllItems(searchContext, heap, token, useVisited); | ||
| } | ||
| } finally { | ||
| // O(k) cleanup of tracking arrays | ||
| for (let i = 0; i < this.visitedTrackerLength; i++) { | ||
| this.visitedIndicesBuffer[this.visitedIndicesTracker[i]] = 0; | ||
| } | ||
| this.visitedTrackerLength = 0; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm performUnifiedSearch / getPreferredIndicesForQuery / searchRemainingItems /
# searchWithIndices / searchAllItems contain no awaits today, and check whether any
# external caller invokes them across concurrent code paths.
rg -nP --type=ts -C2 '\bawait\b' -g 'language-server/src/core/search-engine.ts' \
| rg -n 'performUnifiedSearch|getPreferredIndicesForQuery|searchRemainingItems|searchWithIndices|searchAllItems|populateCandidateIndices' || \
echo "OK: no awaits inside the visited-buffer regions"
echo '---'
rg -nP --type=ts -C2 '\b(performUnifiedSearch|performSymbolSearch|performFileSearch)\s*\('Repository: AhmmedSamier/DeepLens
Length of output: 2838
🏁 Script executed:
# Check full implementation of performSymbolSearch and performFileSearch
sed -n '1595,1622p' language-server/src/core/search-engine.ts | cat -n
echo "---"
sed -n '1624,1637p' language-server/src/core/search-engine.ts | cat -nRepository: AhmmedSamier/DeepLens
Length of output: 2395
🏁 Script executed:
# Check if there are awaits between method start and performUnifiedSearch call in the search methods
sed -n '1595,1640p' language-server/src/core/search-engine.ts | cat -nRepository: AhmmedSamier/DeepLens
Length of output: 2494
🏁 Script executed:
# Check the actual context of how these methods are called - look at symbol-provider and file-provider
cat -n language-server/src/core/providers/symbol-provider.ts
echo "---"
cat -n language-server/src/core/providers/file-provider.tsRepository: AhmmedSamier/DeepLens
Length of output: 2195
🏁 Script executed:
# Check if performUnifiedSearch could actually be reentered by examining the broader call context
# Look for any concurrent pattern in LSP handlers
rg -n 'performSymbolSearch|performFileSearch' language-server/src --type=ts -B3 -A3 | head -60Repository: AhmmedSamier/DeepLens
Length of output: 2098
Fix shared-instance visited tracking to prevent concurrent search corruption.
visitedIndicesBuffer / visitedIndicesTracker / visitedTrackerLength are instance fields used by performUnifiedSearch. Although performUnifiedSearch itself is synchronous, it is called from performSymbolSearch and performFileSearch, both of which are async and contain await statements (at lines 9 and 7 respectively) before the performUnifiedSearch call. This means the methods can yield control to the event loop, allowing concurrent callers to reenter and corrupt the shared visited state.
Example hazard: Thread A calls performSymbolSearch, awaits at line 9, and yields. Thread B calls performSymbolSearch, awaits at line 7, and yields. Thread A resumes and enters performUnifiedSearch with visitedTrackerLength potentially non-zero. Thread B resumes and overwrites the visited tracking. Search results silently corrupt.
Recommend adding a guard like the one below to catch this at dev-time, or wrap the buffer-using region in a runWithVisitedTracking(fn) helper to centralize the contract:
Guard suggestion
): SearchResult[] {
const heap = new MinHeap<SearchResult>(maxResults, (a, b) => a.score - b.score);
const searchContext = this.prepareSearchContext(query, scope);
+ // Guard: visited tracking is instance state and cannot be reentered.
+ if (this.visitedTrackerLength !== 0) {
+ this.logger?.error('SearchEngine: visited tracker entered with non-zero length; resetting');
+ for (let i = 0; i < this.visitedTrackerLength; i++) {
+ this.visitedIndicesBuffer[this.visitedIndicesTracker[i]] = 0;
+ }
+ this.visitedTrackerLength = 0;
+ }
+
try {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@language-server/src/core/search-engine.ts` around lines 1649 - 1669,
performUnifiedSearch uses shared instance fields visitedIndicesBuffer,
visitedIndicesTracker and visitedTrackerLength and can be corrupted by
concurrent callers from async callers like performSymbolSearch and
performFileSearch; add a runWithVisitedTracking(fn) helper (or a simple
reentrancy guard) that asserts visitedTrackerLength === 0 at entry, sets up/uses
the buffers for the duration of fn, and always resets visitedTrackerLength to 0
and clears visitedIndicesBuffer entries in a finally block to prevent cross-call
contamination; update performUnifiedSearch to call runWithVisitedTracking and/or
add the guard check/throw at its start so concurrent reentry is caught during
dev-time instead of silently corrupting searches.
💡 What
Implemented instance-level reusable tracking buffers (
visitedIndicesBufferandvisitedIndicesTracker) for search methods inSearchEngine, coupled with an O(K) reset in atry...finallyblock.🎯 Why
During "burst" searches, instantiating
new Uint8Array(this.items.length)created significant object allocation and GC pressure, particularly when there are 100k+ items. Using a pre-allocated array avoids instantiation, and using a tracking array to explicitly reset only the dirtied indices prevents O(N) operations like.fill(0).📊 Impact
This reduces the latency of high-frequency searches. In our benchmarks, a zero-match burst search over 50,000 items dropped from ~36ms to ~17ms.
🔬 Measurement
Run
cd language-server && bun run benchmarks/search_burst.bench.ts.PR created automatically by Jules for task 17160936138219548008 started by @AhmmedSamier
Summary by CodeRabbit
Release Notes
New Features
getRecentItems()method to retrieve recent search results from the search engine.Documentation
Refactor