Add TopN plan node for O(limit) ORDER BY + LIMIT (Async Version) by platypii · Pull Request #26 · hyparam/squirreling

platypii · 2026-04-14T00:49:11Z

This is the non-eager async version of #24

I'm not willing to give up late materialization. Which might mean this doesn't get us much perf benefit? Unclear but worth testing. @philcunliffe

Add multi-level caching and reduce per-row overhead: - parseSql: LRU cache (64 entries) avoids re-tokenizing/parsing same SQL strings - planSql: WeakMap cache on parsed ASTs avoids re-planning identical queries - asyncRow: attach _data field for zero-copy collection - collect: sync fast-path skips Promise.all when all rows have pre-materialized _data - executeProject: pre-compute static column names, fast-path for simple identifier projections with direct cell passthrough and _data propagation - executeSql: skip table normalization when no array tables are present - compareForTerm: use module-level Set instead of per-call array allocation - memorySource: hoist column computation outside scan loop, use Set for validation

- Add _data to AsyncRow type definition - Cast to DerivedColumn/IdentifierNode where type narrowing is needed - Type _data as Record<string, SqlPrimitive> - Fix JSDoc placement for compareForTerm

Adapt optimizations to the new QueryResults return type: - executeSql: keep table normalization skip, use new inline plan+execute - executeProject: move pre-computation outside rows(), keep identifier fast-path and static column names inside the rows() generator - Add _data to AsyncRow type definition - Fix JSDoc placement and type casts for tsc

Drop the parseSql/planSql memoization caches added in 881a031. Also rename the pre-materialized row payload from `_data` to `resolved` for clarity, and delete stale scratch files (query-parquet.mjs, repro-525.mjs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves all cell values when rows are buffered for ORDER BY, replacing AsyncRow closures (which capture decompressed parquet row group data) with plain value-returning functions. The original closures become GC-eligible immediately. For tables with large text columns (~10KB/row), this reduces per-row buffer cost from ~10KB (closure over parquet data) to ~100B (plain value).

Fuses Sort + Limit into a TopN node that uses a bounded binary max-heap. ORDER BY x LIMIT N now buffers only N rows instead of the entire dataset. The planner detects two patterns: - Limit(Sort(child)) → TopN(child) - Limit(Project(Sort(child))) → Project(TopN(child))

# Conflicts: # src/execute/execute.js # src/execute/sort.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

philcunliffe and others added 12 commits April 9, 2026 16:45

Fix typecheck errors

ac13746

- Add _data to AsyncRow type definition - Cast to DerivedColumn/IdentifierNode where type narrowing is needed - Type _data as Record<string, SqlPrimitive> - Fix JSDoc placement for compareForTerm

Merge remote-tracking branch 'origin/master' into perf/topn-heap

cc954d0

# Conflicts: # src/execute/execute.js # src/execute/sort.js

Add missing JSDoc @param types for siftDown/siftUp

de91774

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert unnecessary table normalization optimization

2660f38

Fix incorrect numRows on LIMIT when source numRows is unknown

d9712e8

Restore late materialization on sorting and topN

8fcefd6

Merge branch 'master' into perf/topn-async

1742704

platypii requested a review from philcunliffe April 14, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TopN plan node for O(limit) ORDER BY + LIMIT (Async Version)#26

Add TopN plan node for O(limit) ORDER BY + LIMIT (Async Version)#26
platypii wants to merge 12 commits intomasterfrom
perf/topn-async

platypii commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

platypii commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants