Skip to content

⚡ Bolt: Fast search tracking reset using O(K) clearing#415

Open
AhmmedSamier wants to merge 1 commit intomasterfrom
bolt-fast-search-tracking-17160936138219548008
Open

⚡ Bolt: Fast search tracking reset using O(K) clearing#415
AhmmedSamier wants to merge 1 commit intomasterfrom
bolt-fast-search-tracking-17160936138219548008

Conversation

@AhmmedSamier
Copy link
Copy Markdown
Owner

@AhmmedSamier AhmmedSamier commented May 5, 2026

💡 What
Implemented instance-level reusable tracking buffers (visitedIndicesBuffer and visitedIndicesTracker) for search methods in SearchEngine, coupled with an O(K) reset in a try...finally block.

🎯 Why
During "burst" searches, instantiating new Uint8Array(this.items.length) created significant object allocation and GC pressure, particularly when there are 100k+ items. Using a pre-allocated array avoids instantiation, and using a tracking array to explicitly reset only the dirtied indices prevents O(N) operations like .fill(0).

📊 Impact
This reduces the latency of high-frequency searches. In our benchmarks, a zero-match burst search over 50,000 items dropped from ~36ms to ~17ms.

🔬 Measurement
Run cd language-server && bun run benchmarks/search_burst.bench.ts.


PR created automatically by Jules for task 17160936138219548008 started by @AhmmedSamier

Summary by CodeRabbit

Release Notes

  • New Features

    • Added getRecentItems() method to retrieve recent search results from the search engine.
  • Documentation

    • Added search optimization documentation detailing latency improvements for Fast Search Tracking Reset.
    • Added benchmark results tracking performance metrics for burst search scenarios.
  • Refactor

    • Optimized search deduplication and memory allocation mechanisms to improve search latency and reduce heap churn.

Co-authored-by: AhmmedSamier <17784876+AhmmedSamier@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

This PR optimizes the search engine's deduplication logic by replacing Set-based tracking with a reusable Uint8Array-backed visited-indices tracker. It introduces a two-phase search strategy prioritizing previously matched indices before scanning remaining items, adding getRecentItems() API, and includes benchmark results and implementation documentation.

Changes

Fast Search Tracking Reset Optimization

Layer / File(s) Summary
Data Shape
language-server/src/core/search-engine.ts
Added instance-level visited-tracking fields: visitedIndicesBuffer (Uint8Array), visitedIndicesTracker (array of indices), and visitedTrackerLength to support reusable per-search deduplication without allocation churn.
Core Implementation
language-server/src/core/search-engine.ts
Rewrote performUnifiedSearch into a two-phase approach: compute preferred indices via getPreferredIndicesForQuery, then search those indices with dedup before scanning remaining items. Introduced populateCandidateIndices helper and refactored searchRemainingItems to use visited-tracker with try...finally cleanup for O(K) resets.
Integration & Wiring
language-server/src/core/search-engine.ts
Updated setItems, addItems, and clear methods to initialize and resize visited-tracking buffers alongside existing per-item arrays. Extended searchWithIndices signature with useVisited flag to control dedup-aware searching.
Public API Surface
language-server/src/core/search-engine.ts
Added getRecentItems(): SearchResult[] public method (stubbed placeholder) to expose new API surface without altering current behavior.
Documentation & Benchmarks
.jules/bolt.md, language-server/benchmarks/results_burst.json
Added implementation notes documenting the O(K) reset optimization and observed latency improvements; included three new burst-search benchmark result objects ("App", "Zzz", "S") with detailed metrics.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Engine as SearchEngine
    participant PrefIdx as Preferred<br/>Indices
    participant Tracker as Visited<br/>Tracker
    participant Search as Search<br/>Pass
    
    Client->>Engine: performUnifiedSearch(query)
    activate Engine
    
    Engine->>PrefIdx: getPreferredIndicesForQuery(query)
    activate PrefIdx
    PrefIdx->>PrefIdx: memoized top results lookup
    PrefIdx-->>Engine: preferred indices list
    deactivate PrefIdx
    
    alt Preferred indices available
        Engine->>Search: searchWithIndices(preferred, useVisited=true)
        activate Search
        Search->>Tracker: populateCandidateIndices(preferred)
        Tracker->>Tracker: mark indices visited
        Search->>Search: search and collect matches
        Search-->>Engine: matches from preferred set
        deactivate Search
    end
    
    Engine->>Search: searchRemainingItems()
    activate Search
    Search->>Tracker: populate remaining candidates
    Tracker->>Tracker: mark remaining visited
    Search->>Search: search with visited dedup
    Search-->>Engine: remaining matches
    deactivate Search
    
    Engine->>Tracker: reset via finally block (O(K))
    Tracker->>Tracker: zero only modified indices
    Engine-->>Client: combined results
    deactivate Engine
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly Related PRs

  • AhmmedSamier/DeepLens#349: Replaces Set/Array-based dedup in search-engine.ts hot paths with dense Uint8Array visited-tracker for de-dup and filtering.
  • AhmmedSamier/DeepLens#368: Modifies SearchEngine hot-paths by replacing Set-based visited/index tracking with Uint8Array-based presence buffers and preallocated trackers.
  • AhmmedSamier/DeepLens#371: Changes hot-path dedup logic in search-engine.ts from Set-based tracking to Uint8Array marker arrays in performUnifiedSearch and getPreferredIndicesForQuery.

Suggested Labels

codex

Poem

🐰 A visited tracker springs to life,
No more Sets to cause allocation strife,
Preferred indices lead the way,
Through Uint8Array's O(K) reset play,
Fast search bursts now chase the prize!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: implementing a fast search tracking reset using O(K) clearing. It directly relates to the core optimization of reusing buffers and resetting only touched indices, which is the primary focus of all file changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-fast-search-tracking-17160936138219548008

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
language-server/src/core/search-engine.ts (1)

2472-2478: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add documentation to clarify that getRecentItems() and recordActivity() are intentional stubs for interface compliance.

SearchEngine implements ISearchProvider and provides these methods as empty stubs because the actual activity tracking is handled by ActivityTracker and delegated at the server level. Without clarification, callers reading these methods cannot distinguish between "feature disabled" and "intentional no-op on aggregator engine". Add a comment explaining that these are required by the interface contract but actual activity tracking occurs externally.

📝 Suggested clarification
+    /**
+     * Required by ISearchProvider but activity tracking is handled by
+     * ActivityTracker and delegated at the server level. Aggregator engine
+     * does not track activity itself.
+     */
     getRecentItems(): SearchResult[] {
         return [];
     }
 
     recordActivity(): void {
-        // No-op
+        // No-op: activity tracking is external; see ActivityTracker
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@language-server/src/core/search-engine.ts` around lines 2472 - 2478, The
methods getRecentItems() and recordActivity() on SearchEngine are intentional
no-op implementations to satisfy the ISearchProvider interface while activity
tracking is handled by ActivityTracker at the server level; update the
SearchEngine class to add concise doc comments above getRecentItems() and
recordActivity() stating they are deliberate stubs for interface compliance and
that activity tracking is delegated externally (mention ActivityTracker and
server-level delegation), so readers understand these are not unimplemented
features but by-design no-ops.
🧹 Nitpick comments (1)
language-server/src/core/search-engine.ts (1)

289-296: 💤 Low value

Buffer-content copy on resize is unnecessary work.

Between searches the visited tracking is fully reset (the try…finally in performUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItems always lands at visitedTrackerLength = 0 with all marked entries zeroed). Since a Uint8Array/Uint32Array is already zero-initialized, the .set(this.visitedIndicesBuffer) / .set(this.visitedIndicesTracker) calls just memcpy stale zeros into a fresh buffer. Skipping them avoids an O(N) copy on every grow, which is the exact cost the rest of this PR is trying to eliminate.

♻️ Drop the redundant copies
-            const newVisitedBuffer = new Uint8Array(newCapacity);
-            newVisitedBuffer.set(this.visitedIndicesBuffer);
-            this.visitedIndicesBuffer = newVisitedBuffer;
-
-            const newVisitedTracker = new Uint32Array(newCapacity);
-            newVisitedTracker.set(this.visitedIndicesTracker);
-            this.visitedIndicesTracker = newVisitedTracker;
+            // Visited tracking is transient; resets to all-zero between searches.
+            // No need to preserve contents on grow.
+            this.visitedIndicesBuffer = new Uint8Array(newCapacity);
+            this.visitedIndicesTracker = new Uint32Array(newCapacity);
+            this.visitedTrackerLength = 0;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@language-server/src/core/search-engine.ts` around lines 289 - 296, The resize
logic copies the old visited buffers into newly allocated zeroed arrays
(newVisitedBuffer.set(this.visitedIndicesBuffer) and
newVisitedTracker.set(this.visitedIndicesTracker>) even though
visitedTrackerLength is always reset to 0 between searches; remove those
.set(...) calls and simply replace visitedIndicesBuffer and
visitedIndicesTracker with new zero-initialized Uint8Array/Uint32Array instances
(created as new Uint8Array(newCapacity) and new Uint32Array(newCapacity)) in the
resize branch inside the method that grows the tracker (the code handling
newVisitedBuffer/newVisitedTracker), relying on the guaranteed zero-init
behavior used by
performUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItems so you
avoid the O(N) memcpy on every grow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 1649-1669: performUnifiedSearch uses shared instance fields
visitedIndicesBuffer, visitedIndicesTracker and visitedTrackerLength and can be
corrupted by concurrent callers from async callers like performSymbolSearch and
performFileSearch; add a runWithVisitedTracking(fn) helper (or a simple
reentrancy guard) that asserts visitedTrackerLength === 0 at entry, sets up/uses
the buffers for the duration of fn, and always resets visitedTrackerLength to 0
and clears visitedIndicesBuffer entries in a finally block to prevent cross-call
contamination; update performUnifiedSearch to call runWithVisitedTracking and/or
add the guard check/throw at its start so concurrent reentry is caught during
dev-time instead of silently corrupting searches.

---

Outside diff comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2472-2478: The methods getRecentItems() and recordActivity() on
SearchEngine are intentional no-op implementations to satisfy the
ISearchProvider interface while activity tracking is handled by ActivityTracker
at the server level; update the SearchEngine class to add concise doc comments
above getRecentItems() and recordActivity() stating they are deliberate stubs
for interface compliance and that activity tracking is delegated externally
(mention ActivityTracker and server-level delegation), so readers understand
these are not unimplemented features but by-design no-ops.

---

Nitpick comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 289-296: The resize logic copies the old visited buffers into
newly allocated zeroed arrays (newVisitedBuffer.set(this.visitedIndicesBuffer)
and newVisitedTracker.set(this.visitedIndicesTracker>) even though
visitedTrackerLength is always reset to 0 between searches; remove those
.set(...) calls and simply replace visitedIndicesBuffer and
visitedIndicesTracker with new zero-initialized Uint8Array/Uint32Array instances
(created as new Uint8Array(newCapacity) and new Uint32Array(newCapacity)) in the
resize branch inside the method that grows the tracker (the code handling
newVisitedBuffer/newVisitedTracker), relying on the guaranteed zero-init
behavior used by
performUnifiedSearch/getPreferredIndicesForQuery/searchRemainingItems so you
avoid the O(N) memcpy on every grow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f16fbe8-418d-4338-ba8a-8845ab86dc92

📥 Commits

Reviewing files that changed from the base of the PR and between 0b051a9 and 596e18a.

📒 Files selected for processing (3)
  • .jules/bolt.md
  • language-server/benchmarks/results_burst.json
  • language-server/src/core/search-engine.ts

Comment on lines +1649 to 1669
try {
// ⚡ Bolt: Fast integer ID tracking using Uint8Array instead of Set
const preferredIndices = this.getPreferredIndicesForQuery(scope, query, indices);
const useVisited = preferredIndices.length > 0;

if (indices) {
this.searchWithIndices(indices, searchContext, heap, token, visited);
} else {
this.searchAllItems(searchContext, heap, token, visited);
if (preferredIndices.length > 0) {
this.searchWithIndices(preferredIndices, searchContext, heap, token, useVisited);
}

if (indices) {
this.searchWithIndices(indices, searchContext, heap, token, useVisited);
} else {
this.searchAllItems(searchContext, heap, token, useVisited);
}
} finally {
// O(k) cleanup of tracking arrays
for (let i = 0; i < this.visitedTrackerLength; i++) {
this.visitedIndicesBuffer[this.visitedIndicesTracker[i]] = 0;
}
this.visitedTrackerLength = 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm performUnifiedSearch / getPreferredIndicesForQuery / searchRemainingItems / 
# searchWithIndices / searchAllItems contain no awaits today, and check whether any
# external caller invokes them across concurrent code paths.

rg -nP --type=ts -C2 '\bawait\b' -g 'language-server/src/core/search-engine.ts' \
  | rg -n 'performUnifiedSearch|getPreferredIndicesForQuery|searchRemainingItems|searchWithIndices|searchAllItems|populateCandidateIndices' || \
  echo "OK: no awaits inside the visited-buffer regions"

echo '---'
rg -nP --type=ts -C2 '\b(performUnifiedSearch|performSymbolSearch|performFileSearch)\s*\('

Repository: AhmmedSamier/DeepLens

Length of output: 2838


🏁 Script executed:

# Check full implementation of performSymbolSearch and performFileSearch
sed -n '1595,1622p' language-server/src/core/search-engine.ts | cat -n

echo "---"

sed -n '1624,1637p' language-server/src/core/search-engine.ts | cat -n

Repository: AhmmedSamier/DeepLens

Length of output: 2395


🏁 Script executed:

# Check if there are awaits between method start and performUnifiedSearch call in the search methods
sed -n '1595,1640p' language-server/src/core/search-engine.ts | cat -n

Repository: AhmmedSamier/DeepLens

Length of output: 2494


🏁 Script executed:

# Check the actual context of how these methods are called - look at symbol-provider and file-provider
cat -n language-server/src/core/providers/symbol-provider.ts

echo "---"

cat -n language-server/src/core/providers/file-provider.ts

Repository: AhmmedSamier/DeepLens

Length of output: 2195


🏁 Script executed:

# Check if performUnifiedSearch could actually be reentered by examining the broader call context
# Look for any concurrent pattern in LSP handlers
rg -n 'performSymbolSearch|performFileSearch' language-server/src --type=ts -B3 -A3 | head -60

Repository: AhmmedSamier/DeepLens

Length of output: 2098


Fix shared-instance visited tracking to prevent concurrent search corruption.

visitedIndicesBuffer / visitedIndicesTracker / visitedTrackerLength are instance fields used by performUnifiedSearch. Although performUnifiedSearch itself is synchronous, it is called from performSymbolSearch and performFileSearch, both of which are async and contain await statements (at lines 9 and 7 respectively) before the performUnifiedSearch call. This means the methods can yield control to the event loop, allowing concurrent callers to reenter and corrupt the shared visited state.

Example hazard: Thread A calls performSymbolSearch, awaits at line 9, and yields. Thread B calls performSymbolSearch, awaits at line 7, and yields. Thread A resumes and enters performUnifiedSearch with visitedTrackerLength potentially non-zero. Thread B resumes and overwrites the visited tracking. Search results silently corrupt.

Recommend adding a guard like the one below to catch this at dev-time, or wrap the buffer-using region in a runWithVisitedTracking(fn) helper to centralize the contract:

Guard suggestion
     ): SearchResult[] {
         const heap = new MinHeap<SearchResult>(maxResults, (a, b) => a.score - b.score);
         const searchContext = this.prepareSearchContext(query, scope);
 
+        // Guard: visited tracking is instance state and cannot be reentered.
+        if (this.visitedTrackerLength !== 0) {
+            this.logger?.error('SearchEngine: visited tracker entered with non-zero length; resetting');
+            for (let i = 0; i < this.visitedTrackerLength; i++) {
+                this.visitedIndicesBuffer[this.visitedIndicesTracker[i]] = 0;
+            }
+            this.visitedTrackerLength = 0;
+        }
+
         try {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@language-server/src/core/search-engine.ts` around lines 1649 - 1669,
performUnifiedSearch uses shared instance fields visitedIndicesBuffer,
visitedIndicesTracker and visitedTrackerLength and can be corrupted by
concurrent callers from async callers like performSymbolSearch and
performFileSearch; add a runWithVisitedTracking(fn) helper (or a simple
reentrancy guard) that asserts visitedTrackerLength === 0 at entry, sets up/uses
the buffers for the duration of fn, and always resets visitedTrackerLength to 0
and clears visitedIndicesBuffer entries in a finally block to prevent cross-call
contamination; update performUnifiedSearch to call runWithVisitedTracking and/or
add the guard check/throw at its start so concurrent reentry is caught during
dev-time instead of silently corrupting searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant