Skip to content

perf: single-pass tree-sitter analysis for SCIP-sourced files #309

@jafreck

Description

@jafreck

Summary

Follow up on the indexer performance work by removing duplicate tree-sitter extraction for SCIP-sourced files and pushing the remaining source-index work through the parallel worker path.

Problem

On the lore-self cached benchmark repo, the current full index spends most of its time in:

  • scip-indexer
  • source-index

The main avoidable cost is that SCIP-covered files are still tree-sitter extracted again in SourceIndexStage just to patch end_line and compute symbol_metrics. That pass was serial on the main thread.

There was also a worker-boundary hazard when trying to parallelize that path: extraction results included live astNode handles, which are not safe to transfer from worker threads.

Required changes

  • Run the SCIP metrics/end-line pass through the existing parse-worker path when the compiled worker script is available
  • Keep tree-sitter extraction for SCIP-sourced files to a single extraction pass per file during build
  • Reuse the derived call/type data in ScipRefStage where possible so correctness is preserved
  • Strip worker-unsafe AST handles from parse-worker payloads and ship plain JSON plus precomputed symbol metrics instead
  • Add regression coverage for the SCIP-sourced build path
  • Verify with the cached lore-self benchmark repo

Benchmark target

Benchmark command:

node dist/cli.js index --root .benchmark/lore-self --db .benchmark/lore-self/.lore.db --history-depth 100

Current measured baseline before this follow-up:

  • fresh current build: about 6.45s wall clock
  • source-index: about 2534ms
  • ScipRefStage: about 478ms

Target outcome:

  • materially reduce source-index time on SCIP-heavy repos
  • do not regress totalEdges, symbol_refs, or type_refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions