Enhance search performance by muendlein · Pull Request #64 · mourner/flatbush

muendlein · 2025-05-10T20:11:04Z

Fixes issue: #60

before

1000 searches 100%: 34.013s
1000 searches 75%: 25.787s
1000 searches 50%: 16.775s
1000 searches 25%: 9.530s
1000 searches 10%: 3.225s
1000 searches 1%: 280.145ms
1000 searches 0.01%: 11.675ms

after

1000 searches 100%: 18.888s
1000 searches 75%: 13.866s
1000 searches 50%: 9.127s
1000 searches 25%: 5.771s
1000 searches 10%: 1.963s
1000 searches 1%: 208.407ms
1000 searches 0.01%: 15.793ms

mourner

Overall this is a really awesome improvement! Thanks for a great contribution. 👍

mourner · 2025-05-10T21:35:51Z

Also wondering: is this increase for smaller queries a fluke due or is there an overhead?

before: 1000 searches 0.01%: 11.675ms
after: 1000 searches 0.01%: 15.793ms

muendlein · 2025-05-11T20:22:41Z

Also wondering: is this increase for smaller queries a fluke due or is there an overhead?

before: 1000 searches 0.01%: 11.675ms
after: 1000 searches 0.01%: 15.793ms

I would say both is possible here. The current benchmark is not sophisticated enough to track this down. Overall the time for small queries seems to scatters signicantly with every run.

muendlein · 2025-05-13T21:40:22Z

Also wondering: is this increase for smaller queries a fluke due or is there an overhead?

before: 1000 searches 0.01%: 11.675ms
after: 1000 searches 0.01%: 15.793ms

I would say both is possible here. The current benchmark is not sophisticated enough to track this down. Overall the time for small queries seems to scatters signicantly with every run.

After some more benchmarking there seems to be an apparent overhead if the number of rectangles in the search area are low.

Edit: After some more debugging this seems to be also present if the relevant code is unreachable which indicates that it is related to the javascript engine or javascript timers not being precise in the first place.

after						before
                                                
1000000 rectangles                              1000000 rectangles
Indexing: 174.82 +- 10.33 ms                    Indexing: 178.94 +- 6.13 ms
1000 searches 100%: 16915.16 +- 27.19 ms        1000 searches 100%: 29411.44 +- 114.
1000 searches 75%: 12091.25 +- 34.09 ms         1000 searches 75%: 20886.99 +- 28.87
1000 searches 50%: 8091.64 +- 16.49 ms          1000 searches 50%: 13845.36 +- 22.50
1000 searches 25%: 5114.00 +- 36.44 ms          1000 searches 25%: 8035.35 +- 30.98 
1000 searches 10%: 1700.65 +- 31.92 ms          1000 searches 10%: 2681.03 +- 30.18 
1000 searches 1%: 185.35 +- 3.87 ms             1000 searches 1%: 213.03 +- 7.76 ms
1000 searches 0.1%: 43.53 +- 2.34 ms            1000 searches 0.1%: 33.70 +- 0.48 ms
1000 searches 0.01%: 14.62 +- 1.44 ms           1000 searches 0.01%: 10.37 +- 0.18 m
                                                
500000 rectangles                               500000 rectangles
Indexing: 86.71 +- 3.26 ms                      Indexing: 83.50 +- 3.08 ms
1000 searches 100%: 7966.13 +- 285.32 ms        1000 searches 100%: 13432.76 +- 37.1
1000 searches 75%: 8280.05 +- 137.75 ms         1000 searches 75%: 12058.49 +- 60.04
1000 searches 50%: 5490.95 +- 115.95 ms         1000 searches 50%: 7963.56 +- 31.44 
1000 searches 25%: 2517.51 +- 85.76 ms          1000 searches 25%: 3677.59 +- 28.52 
1000 searches 10%: 1105.34 +- 16.96 ms          1000 searches 10%: 1454.78 +- 8.84 m
1000 searches 1%: 97.87 +- 4.05 ms              1000 searches 1%: 99.88 +- 2.78 ms
1000 searches 0.1%: 25.27 +- 1.06 ms            1000 searches 0.1%: 18.65 +- 0.73 ms
1000 searches 0.01%: 7.61 +- 0.35 ms            1000 searches 0.01%: 5.55 +- 0.21 ms
                                                
250000 rectangles                               250000 rectangles
Indexing: 38.60 +- 0.35 ms                      Indexing: 40.61 +- 1.07 ms
1000 searches 100%: 3429.81 +- 29.19 ms         1000 searches 100%: 6239.33 +- 13.67
1000 searches 75%: 3609.75 +- 23.50 ms          1000 searches 75%: 5551.44 +- 15.86 
1000 searches 50%: 2388.36 +- 27.00 ms          1000 searches 50%: 3659.61 +- 20.25 
1000 searches 25%: 1084.09 +- 6.67 ms           1000 searches 25%: 1435.32 +- 24.14 
1000 searches 10%: 454.77 +- 11.62 ms           1000 searches 10%: 569.01 +- 22.03 m
1000 searches 1%: 53.35 +- 3.08 ms              1000 searches 1%: 48.03 +- 3.20 ms
1000 searches 0.1%: 10.47 +- 0.23 ms            1000 searches 0.1%: 8.16 +- 0.62 ms
1000 searches 0.01%: 4.00 +- 0.21 ms            1000 searches 0.01%: 3.30 +- 0.75 ms
                                                
100000 rectangles                               100000 rectangles
Indexing: 14.93 +- 0.88 ms                      Indexing: 15.11 +- 1.01 ms
1000 searches 100%: 1415.36 +- 16.48 ms         1000 searches 100%: 2148.95 +- 16.61
1000 searches 75%: 1499.75 +- 20.38 ms          1000 searches 75%: 1881.47 +- 14.69 
1000 searches 50%: 1000.98 +- 13.85 ms          1000 searches 50%: 1141.08 +- 15.79 
1000 searches 25%: 423.34 +- 5.90 ms            1000 searches 25%: 458.26 +- 10.56 m
1000 searches 10%: 118.69 +- 1.89 ms            1000 searches 10%: 113.83 +- 1.30 ms
1000 searches 1%: 19.49 +- 0.55 ms              1000 searches 1%: 14.33 +- 0.14 ms
1000 searches 0.1%: 5.33 +- 0.21 ms             1000 searches 0.1%: 3.34 +- 0.05 ms
1000 searches 0.01%: 2.13 +- 0.03 ms            1000 searches 0.01%: 1.47 +- 0.02 ms
                                                
50000 rectangles                                50000 rectangles
Indexing: 6.94 +- 0.20 ms                       Indexing: 7.13 +- 0.13 ms
1000 searches 100%: 584.78 +- 10.95 ms          1000 searches 100%: 729.12 +- 12.03 
1000 searches 75%: 635.39 +- 15.14 ms           1000 searches 75%: 696.66 +- 7.82 ms
1000 searches 50%: 397.94 +- 16.80 ms           1000 searches 50%: 422.41 +- 8.55 ms
1000 searches 25%: 123.37 +- 2.38 ms            1000 searches 25%: 118.10 +- 1.51 ms
1000 searches 10%: 59.40 +- 1.41 ms             1000 searches 10%: 55.20 +- 2.80 ms
1000 searches 1%: 12.01 +- 0.54 ms              1000 searches 1%: 9.09 +- 0.65 ms
1000 searches 0.1%: 3.19 +- 0.03 ms             1000 searches 0.1%: 2.23 +- 0.10 ms
1000 searches 0.01%: 1.59 +- 0.11 ms            1000 searches 0.01%: 1.16 +- 0.12 ms
                                                
25000 rectangles                                25000 rectangles
Indexing: 3.82 +- 0.46 ms                       Indexing: 3.52 +- 0.08 ms
1000 searches 100%: 328.56 +- 11.57 ms          1000 searches 100%: 407.00 +- 9.08 m
1000 searches 75%: 225.00 +- 3.29 ms            1000 searches 75%: 238.60 +- 6.12 ms
1000 searches 50%: 114.57 +- 4.05 ms            1000 searches 50%: 117.15 +- 4.78 ms
1000 searches 25%: 64.82 +- 3.11 ms             1000 searches 25%: 62.83 +- 1.03 ms
1000 searches 10%: 31.45 +- 1.54 ms             1000 searches 10%: 27.40 +- 1.79 ms
1000 searches 1%: 6.86 +- 0.13 ms               1000 searches 1%: 4.83 +- 0.32 ms
1000 searches 0.1%: 2.13 +- 0.02 ms             1000 searches 0.1%: 1.40 +- 0.02 ms
1000 searches 0.01%: 1.17 +- 0.01 ms            1000 searches 0.01%: 0.82 +- 0.01 ms
                                                
10000 rectangles                                10000 rectangles
Indexing: 3.82 +- 3.65 ms                       Indexing: 3.98 +- 4.70 ms
1000 searches 100%: 62.94 +- 2.13 ms            1000 searches 100%: 98.41 +- 10.85 m
1000 searches 75%: 78.66 +- 3.50 ms             1000 searches 75%: 87.03 +- 9.78 ms
1000 searches 50%: 54.85 +- 0.89 ms             1000 searches 50%: 60.02 +- 6.40 ms
1000 searches 25%: 30.96 +- 0.52 ms             1000 searches 25%: 30.87 +- 2.78 ms
1000 searches 10%: 16.13 +- 0.88 ms             1000 searches 10%: 14.25 +- 1.59 ms
1000 searches 1%: 3.89 +- 0.45 ms               1000 searches 1%: 2.45 +- 0.20 ms
1000 searches 0.1%: 1.48 +- 0.15 ms             1000 searches 0.1%: 0.92 +- 0.01 ms
1000 searches 0.01%: 0.93 +- 0.10 ms            1000 searches 0.01%: 0.68 +- 0.11 ms

mourner · 2025-05-18T13:05:12Z

@muendlein might be worth increasing the number of searches between timings for a possibly more reliable measurement. Does the last commit help? I'll likely land this anyway eventually since it's an obvious net positive, just wanted to explore whether we could minimize the hit on smaller queries.

muendlein · 2025-05-18T20:06:37Z

@muendlein might be worth increasing the number of searches between timings for a possibly more reliable measurement. Does the last commit help? I'll likely land this anyway eventually since it's an obvious net positive, just wanted to explore whether we could minimize the hit on smaller queries.

@mourner The last commit has only been a minor improvement which doesn't close the gap.
After some more playing around, I don't think that increasing the number of searches will have any impact as the repeatability is too consistent. After observing that the gap still exists even if the code is never reached, I'm unfortunately out of ideas.

mourner · 2025-05-18T20:16:25Z

After observing that the gap still exists even if the code is never reached, I'm unfortunately out of ideas.

One idea is to set an empyrical threshold of the query area compared to data bounds, over which we'll apply the "all in query bounds" logic, and under which we'll leave the existing logic. The overhead for calculating that area for a single query should be small.

muendlein · 2025-05-18T20:44:38Z

After observing that the gap still exists even if the code is never reached, I'm unfortunately out of ideas.

One idea is to set an empyrical threshold of the query area compared to data bounds, over which we'll apply the "all in query bounds" logic, and under which we'll leave the existing logic. The overhead for calculating that area for a single query should be small.

Edit: At least the topic of unreachable functions can be solved with a separate function.

@mourner Already tried something similar which does not have an impact. All my testing points towards JIT compiler issues.

To make it more clear here are two unreachable examples.

Example 1:

This is exactly the old logic apart from the additional if (10 < 5). Unfortunately this brings no improvement compared to the new logic of this PR.

Before:

100000 rectangles
Indexing: 14.65 +- 0.72 ms
1000 searches 10%: 125.01 +- 8.72 ms
1000 searches 1%: 15.09 +- 1.02 ms
1000 searches 0.1%: 3.71 +- 0.37 ms
1000 searches 0.01%: 1.51 +- 0.07 ms

This PR:

100000 rectangles
Indexing: 14.15 +- 0.13 ms
1000 searches 10%: 111.84 +- 1.59 ms
1000 searches 1%: 16.88 +- 0.28 ms
1000 searches 0.1%: 4.90 +- 0.10 ms
1000 searches 0.01%: 2.00 +- 0.03 ms

Unreachable example 1:

100000 rectangles
Indexing: 14.21 +- 0.21 ms
1000 searches 10%: 157.39 +- 5.32 ms
1000 searches 1%: 19.68 +- 0.53 ms
1000 searches 0.1%: 4.44 +- 0.08 ms
1000 searches 0.01%: 2.02 +- 0.14 ms

        if (this._pos !== this._boxes.length) {
            throw new Error('Data not yet indexed - call index.finish().');
        }

        /** @type number | undefined */
        let nodeIndex = this._boxes.length - 4;
        const queue = [];
        const results = [];

        while (nodeIndex !== undefined) {
            // find the end index of the node
            const end = Math.min(nodeIndex + this.nodeSize * 4, upperBound(nodeIndex, this._levelBounds));

            // search through child nodes
            for (let /** @type number */ pos = nodeIndex; pos < end; pos += 4) {
                const nodeMinX = this._boxes[pos + 0];
                const nodeMinY = this._boxes[pos + 1];
                const nodeMaxX = this._boxes[pos + 2];
                const nodeMaxY = this._boxes[pos + 3];

                // check if node bbox intersects with query bbox
                if (maxX < nodeMinX || maxY < nodeMinY || minX > nodeMaxX || minY > nodeMaxY) {
                    continue;
                }

                const index = this._indices[pos >> 2] | 0;

                if (nodeIndex >= this.numItems * 4) {
                    if (10 < 5) {
                        // check if node bbox is completely inside query bbox
                        if (minX <= nodeMinX && minY <= nodeMinY && maxX >= nodeMaxX && maxY >= nodeMaxY) {
                            let posStart = pos;
                            let posEnd = pos;

                            // depth search while not leaf
                            while (posStart >= this.numItems * 4) {
                                posStart = this._indices[posStart >> 2] | 0;
                                const posEndStart = this._indices[posEnd >> 2] | 0;
                                posEnd = Math.min(posEndStart + this.nodeSize * 4, upperBound(posEndStart, this._levelBounds)) - 4;
                            }

                            for (let /** @type number */ leafPos = posStart; leafPos <= posEnd; leafPos += 4) {
                                const leafIndex = this._indices[leafPos >> 2];
                                if (filterFn === undefined || filterFn(leafIndex)) {
                                    results.push(leafIndex); // leaf item
                                }
                            }
                        }
                    } else {
                        queue.push(index); // node; add it to the search queue
                    }
                } else if (filterFn === undefined || filterFn(index)) {
                    results.push(index); // leaf item
                }
            }

            nodeIndex = queue.pop();
        }

        return results;

Example 2:

Same as example 1 but with the majority of the logic removed inside the unreachable if statement. Now you can see that the performance of this example is the same as before. Right now I don't have any good explanation for this except for certain compiler optimizations.

Before:

100000 rectangles
Indexing: 14.35 +- 0.36 ms
1000 searches 10%: 115.16 +- 2.56 ms
1000 searches 1%: 14.23 +- 0.13 ms
1000 searches 0.1%: 3.41 +- 0.10 ms
1000 searches 0.01%: 1.44 +- 0.01 ms

This PR:

100000 rectangles
Indexing: 14.55 +- 0.16 ms
1000 searches 10%: 112.36 +- 2.86 ms
1000 searches 1%: 18.17 +- 2.15 ms
1000 searches 0.1%: 4.64 +- 0.13 ms
1000 searches 0.01%: 2.06 +- 0.08 ms

Unreachable example 2:

100000 rectangles
Indexing: 14.23 +- 0.15 ms
1000 searches 10%: 114.44 +- 1.03 ms
1000 searches 1%: 14.30 +- 0.18 ms
1000 searches 0.1%: 3.35 +- 0.05 ms
1000 searches 0.01%: 1.48 +- 0.03 ms

        if (this._pos !== this._boxes.length) {
            throw new Error('Data not yet indexed - call index.finish().');
        }

        /** @type number | undefined */
        let nodeIndex = this._boxes.length - 4;
        const queue = [];
        const results = [];

        while (nodeIndex !== undefined) {
            // find the end index of the node
            const end = Math.min(nodeIndex + this.nodeSize * 4, upperBound(nodeIndex, this._levelBounds));

            // search through child nodes
            for (let /** @type number */ pos = nodeIndex; pos < end; pos += 4) {
                const nodeMinX = this._boxes[pos + 0];
                const nodeMinY = this._boxes[pos + 1];
                const nodeMaxX = this._boxes[pos + 2];
                const nodeMaxY = this._boxes[pos + 3];

                // check if node bbox intersects with query bbox
                if (maxX < nodeMinX || maxY < nodeMinY || minX > nodeMaxX || minY > nodeMaxY) {
                    continue;
                }

                const index = this._indices[pos >> 2] | 0;

                if (nodeIndex >= this.numItems * 4) {
                    if (10 < 5) {
                        // check if node bbox is completely inside query bbox
                        if (minX <= nodeMinX && minY <= nodeMinY && maxX >= nodeMaxX && maxY >= nodeMaxY) {
                            let posStart = pos;
                            let posEnd = pos;
                        }
                    } else {
                        queue.push(index); // node; add it to the search queue
                    }
                } else if (filterFn === undefined || filterFn(index)) {
                    results.push(index); // leaf item
                }
            }

            nodeIndex = queue.pop();
        }

        return results;

mourner · 2025-05-18T21:03:21Z

@muendlein I've seen similar behavior before, and my guess is that it's because of v8 inlining. Over a certain threshold of complexity or size, v8 stops inlining the function, which makes it slower for small payloads. Might be worth experimenting with cutting out some of the logic into a separate function so that most of the hot path code in search remains inlineable.

muendlein · 2025-05-18T21:11:44Z

@muendlein I've seen similar behavior before, and my guess is that it's because of v8 inlining. Over a certain threshold of complexity or size, v8 stops inlining the function, which makes it slower for small payloads. Might be worth experimenting with cutting out some of the logic into a separate function so that most of the hot path code in search remains inlineable.

@mourner Just tested this (see my edit above), at least it will fix the topic of unreachable code paths. But for now I only see small improvements in the "real" scenario.
But at least I have a good starting point for more experiments.

muendlein · 2025-05-19T21:49:24Z

With the latest commit the gap has been reduced but is still there.

after last commit              			 before this PR

1000000 rectangles				 1000000 rectangles
Indexing: 182.94 +- 11.81 ms                     Indexing: 178.94 +- 6.13 ms
1000 searches 100%: 17005.09 +- 410.28 ms        1000 searches 100%: 29411.44 +- 114.
1000 searches 75%: 12021.44 +- 68.88 ms          1000 searches 75%: 20886.99 +- 28.87
1000 searches 50%: 8026.82 +- 22.21 ms           1000 searches 50%: 13845.36 +- 22.50
1000 searches 25%: 5073.42 +- 44.18 ms           1000 searches 25%: 8035.35 +- 30.98 
1000 searches 10%: 1687.99 +- 14.58 ms           1000 searches 10%: 2681.03 +- 30.18 
1000 searches 1%: 155.26 +- 1.00 ms              1000 searches 1%: 213.03 +- 7.76 ms
1000 searches 0.1%: 35.50 +- 0.16 ms             1000 searches 0.1%: 33.70 +- 0.48 ms
1000 searches 0.01%: 12.07 +- 0.15 ms            1000 searches 0.01%: 10.37 +- 0.18 m
                                                 
500000 rectangles                                500000 rectangles
Indexing: 79.79 +- 0.90 ms                       Indexing: 83.50 +- 3.08 ms
1000 searches 100%: 7501.64 +- 382.36 ms         1000 searches 100%: 13432.76 +- 37.1
1000 searches 75%: 7513.90 +- 37.22 ms           1000 searches 75%: 12058.49 +- 60.04
1000 searches 50%: 5002.77 +- 24.10 ms           1000 searches 50%: 7963.56 +- 31.44 
1000 searches 25%: 2303.79 +- 10.96 ms           1000 searches 25%: 3677.59 +- 28.52 
1000 searches 10%: 1013.43 +- 4.51 ms            1000 searches 10%: 1454.78 +- 8.84 m
1000 searches 1%: 82.99 +- 5.88 ms               1000 searches 1%: 99.88 +- 2.78 ms
1000 searches 0.1%: 20.10 +- 0.60 ms             1000 searches 0.1%: 18.65 +- 0.73 ms
1000 searches 0.01%: 6.54 +- 0.18 ms             1000 searches 0.01%: 5.55 +- 0.21 ms
                                                 
250000 rectangles                                250000 rectangles
Indexing: 38.87 +- 0.43 ms                       Indexing: 40.61 +- 1.07 ms
1000 searches 100%: 3236.10 +- 14.80 ms          1000 searches 100%: 6239.33 +- 13.67
1000 searches 75%: 3418.34 +- 39.25 ms           1000 searches 75%: 5551.44 +- 15.86 
1000 searches 50%: 2260.82 +- 18.72 ms           1000 searches 50%: 3659.61 +- 20.25 
1000 searches 25%: 1037.22 +- 11.76 ms           1000 searches 25%: 1435.32 +- 24.14 
1000 searches 10%: 416.20 +- 9.52 ms             1000 searches 10%: 569.01 +- 22.03 m
1000 searches 1%: 48.00 +- 2.87 ms               1000 searches 1%: 48.03 +- 3.20 ms
1000 searches 0.1%: 9.34 +- 0.53 ms              1000 searches 0.1%: 8.16 +- 0.62 ms
1000 searches 0.01%: 3.71 +- 0.21 ms             1000 searches 0.01%: 3.30 +- 0.75 ms
                                                 
100000 rectangles                                100000 rectangles
Indexing: 14.42 +- 0.52 ms                       Indexing: 15.11 +- 1.01 ms
1000 searches 100%: 1343.67 +- 19.48 ms          1000 searches 100%: 2148.95 +- 16.61
1000 searches 75%: 1414.79 +- 16.05 ms           1000 searches 75%: 1881.47 +- 14.69 
1000 searches 50%: 934.25 +- 5.47 ms             1000 searches 50%: 1141.08 +- 15.79 
1000 searches 25%: 396.69 +- 12.29 ms            1000 searches 25%: 458.26 +- 10.56 m
1000 searches 10%: 108.51 +- 3.56 ms             1000 searches 10%: 113.83 +- 1.30 ms
1000 searches 1%: 15.83 +- 0.08 ms               1000 searches 1%: 14.33 +- 0.14 ms
1000 searches 0.1%: 4.31 +- 0.14 ms              1000 searches 0.1%: 3.34 +- 0.05 ms
1000 searches 0.01%: 1.88 +- 0.02 ms             1000 searches 0.01%: 1.47 +- 0.02 ms
                                                 
50000 rectangles                                 50000 rectangles
Indexing: 7.12 +- 0.41 ms                        Indexing: 7.13 +- 0.13 ms
1000 searches 100%: 560.33 +- 14.39 ms           1000 searches 100%: 729.12 +- 12.03 
1000 searches 75%: 590.22 +- 12.87 ms            1000 searches 75%: 696.66 +- 7.82 ms
1000 searches 50%: 378.47 +- 20.60 ms            1000 searches 50%: 422.41 +- 8.55 ms
1000 searches 25%: 108.08 +- 1.83 ms             1000 searches 25%: 118.10 +- 1.51 ms
1000 searches 10%: 51.08 +- 2.27 ms              1000 searches 10%: 55.20 +- 2.80 ms
1000 searches 1%: 9.81 +- 0.47 ms                1000 searches 1%: 9.09 +- 0.65 ms
1000 searches 0.1%: 2.72 +- 0.14 ms              1000 searches 0.1%: 2.23 +- 0.10 ms
1000 searches 0.01%: 1.38 +- 0.03 ms             1000 searches 0.01%: 1.16 +- 0.12 ms
                                                 
25000 rectangles                                 25000 rectangles
Indexing: 3.95 +- 1.02 ms                        Indexing: 3.52 +- 0.08 ms
1000 searches 100%: 322.03 +- 8.93 ms            1000 searches 100%: 407.00 +- 9.08 m
1000 searches 75%: 208.44 +- 5.53 ms             1000 searches 75%: 238.60 +- 6.12 ms
1000 searches 50%: 101.36 +- 1.83 ms             1000 searches 50%: 117.15 +- 4.78 ms
1000 searches 25%: 54.99 +- 0.97 ms              1000 searches 25%: 62.83 +- 1.03 ms
1000 searches 10%: 25.82 +- 0.21 ms              1000 searches 10%: 27.40 +- 1.79 ms
1000 searches 1%: 5.50 +- 0.16 ms                1000 searches 1%: 4.83 +- 0.32 ms
1000 searches 0.1%: 1.80 +- 0.02 ms              1000 searches 0.1%: 1.40 +- 0.02 ms
1000 searches 0.01%: 1.06 +- 0.01 ms             1000 searches 0.01%: 0.82 +- 0.01 ms
                                                 
10000 rectangles                                 10000 rectangles
Indexing: 4.95 +- 5.92 ms                        Indexing: 3.98 +- 4.70 ms
1000 searches 100%: 59.80 +- 1.96 ms             1000 searches 100%: 98.41 +- 10.85 m
1000 searches 75%: 67.49 +- 1.39 ms              1000 searches 75%: 87.03 +- 9.78 ms
1000 searches 50%: 48.07 +- 0.82 ms              1000 searches 50%: 60.02 +- 6.40 ms
1000 searches 25%: 26.20 +- 0.77 ms              1000 searches 25%: 30.87 +- 2.78 ms
1000 searches 10%: 13.59 +- 0.34 ms              1000 searches 10%: 14.25 +- 1.59 ms
1000 searches 1%: 3.06 +- 0.34 ms                1000 searches 1%: 2.45 +- 0.20 ms
1000 searches 0.1%: 1.17 +- 0.02 ms              1000 searches 0.1%: 0.92 +- 0.01 ms
1000 searches 0.01%: 0.80 +- 0.01 ms             1000 searches 0.01%: 0.68 +- 0.11 ms

mourner · 2025-05-20T10:25:20Z

@muendlein this one looks much better!

I guess you can inline addLeafSegment for simplicity since it shouldn't affect inlining of the main search function, right?
Now that inlining happens again, perhaps let's try the heuristic again? (Don't do minX <= nodeMinX && minY <= nodeMinY && maxX >= nodeMaxX && maxY >= nodeMaxY check if the query area is small enough)

muendlein · 2025-05-20T19:23:02Z

@mourner Inlining addLeafSegment into addAllLeavesOfNode does not have a performance impact (I just pushed the commit).

What exactly is your idea about the heuristic approach to check the relative query size? Especially for unbalanced distributions I'm not sure if this is even feasible.

mourner · 2025-05-20T21:47:35Z

@muendlein so, if I do fb78a2e and then put if (false && minX <= boxes[pos] ... to make that branch unreachable, small queries run as fast as before. So maybe we could do if (isSmallQuery && minX <= boxes[pos] ..., picking isSmallQuery based on the query bbox compared to data bbox somewhat empyrically — the worst that can happen if we pick badly is the query will be slightly slower, but most of the time it should be as fast as before for small queries and much faster for larger queries (when the optimization kicks in).

muendlein · 2025-05-21T17:10:16Z

@muendlein so, if I do fb78a2e and then put if (false && minX <= boxes[pos] ... to make that branch unreachable, small queries run as fast as before. So maybe we could do if (isSmallQuery && minX <= boxes[pos] ..., picking isSmallQuery based on the query bbox compared to data bbox somewhat empyrically — the worst that can happen if we pick badly is the query will be slightly slower, but most of the time it should be as fast as before for small queries and much faster for larger queries (when the optimization kicks in).

I assume you mean !isSmallQuery in the if statement?
In the worst case it can be significantly slower not just slightly. This happens when the search area is large but we guess it is a small query. In this case we don't run the optimized path but rather have the performance before this PR. As the greatest speedup is exactly with such large search areas, the penalty for a wrong guess is pretty high.

This brings me to the next point, namely how create an empirical yet reliable yet fast isSmallQuery function. Personally I think that this is not feasible after playing a bit around with possible ideas. But I'm open for any suggestion.

mourner · 2025-05-21T17:49:48Z

the penalty for a wrong guess is pretty high

It's high compared to the case when we guess right; however it's very small compared to not landing this PR. So we need to decide what's better: accept a guaranteed notable performance drop for small queries (arguably the more prevalent case in real world apps than big queries), or accept that we'll sometimes guess wrong and the performance will be as before.

I'd probably try the simplest metric possible, e.g. const isSmallQuery = queryArea / this._dataBBoxArea < 0.01 (or some other threshold we pick experimentally).

muendlein · 2025-05-21T20:23:56Z

@mourner I think you are asking the right question about what is the better choice here. Given the varity of datasets & query usecases, I think 1 solution fits all won't exist. For example your proposed simple metric can work great for well distributed datasets but may yield undesired performance in case of strongly unbalanced datasets.
Personally I think the best option is therefore to allow some user control from the outside in combination with your proposal.
The simplest way to achieve this, is by make the threshold an input parameter with an empirical derived default.

Threshold -> 0: always run the optimized path (best performance for large queries)
Threshold -> 0 to 1: best but non deterministic performance
Threshold -> >=1: always run the standard path (best performance for small queries)

This ensures that users themselves can optimize search performance for their respective dataset.
What do you think?

leeoniya · 2025-05-22T14:03:28Z

👍 to exposing an option if the regression is unavoidable. "mouse hover single data point in a scatterplot of 300k points" is something i would prefer not to regress.

muendlein · 2025-05-22T19:48:21Z

👍 to exposing an option if the regression is unavoidable. "mouse hover single data point in a scatterplot of 300k points" is something i would prefer not to regress.

@leeoniya Even with the regression, I'm pretty sure you won't observe any difference as it relative not absolute. When talking about hovering there are much slower processes involved.
For context at 300k points you are looking at <50 microseconds for 10 searches.

mourner · 2025-05-22T20:08:57Z

I'm hesitant about introducing such an option, because I'd like to keep the library simple, minimal and working perfectly out of the box, and this parameter is pretty confusing and difficult to explain. I think it's fine if there are some weird edge cases with heavily imbalanced datasets where the optimization doesn't kick in, as long as the library as a whole performs great most of the time. So I'd still try to explore the heuristical approach, if there are no other ideas on how to address the small query regression.

muendlein · 2025-05-22T21:07:14Z

@mourner I just tested the simplest heuristics approach and it seems like we are back fighting the compiler.
Approach (note: dataArea is already calculated during finish): const isLargeQuery = ((maxX - minX) * (maxY - minY)) > (0.5 * this._dataArea);

Basically as soon as if (false && minX <= boxes[pos] ... is an expression instead of a constant, we are getting a performance penalty that puts it on par with the current status of this PR.
Additionally, some minor performance can be gained in case the function never sees a larger query.
Now I'm basically a bit out of ideas.

before this PR

500000 rectangles
Indexing: 85.42 +- 2.83 ms
1000 searches 1%: 99.98 +- 6.24 ms
1000 searches 0.1%: 18.11 +- 0.43 ms
1000 searches 0.01%: 5.54 +- 0.40 ms
1000 searches 0.001%: 3.27 +- 0.24 ms
1000 searches 0.00009999999999999999%: 2.52 +- 0.11 ms
1000 searches 0.000009999999999999999%: 2.43 +- 0.11 ms

PR without any heuristics

500000 rectangles
Indexing: 85.83 +- 3.29 ms
1000 searches 1%: 86.02 +- 6.64 ms
1000 searches 0.1%: 20.96 +- 0.28 ms
1000 searches 0.01%: 6.95 +- 0.11 ms
1000 searches 0.001%: 4.02 +- 0.11 ms
1000 searches 0.00009999999999999999%: 3.29 +- 0.22 ms
1000 searches 0.000009999999999999999%: 3.23 +- 0.43 ms

PR with simple heuristics

500000 rectangles
Indexing: 86.27 +- 3.79 ms
1000 searches 1%: 123.81 +- 2.08 ms
1000 searches 0.1%: 22.27 +- 0.61 ms
1000 searches 0.01%: 6.97 +- 0.07 ms
1000 searches 0.001%: 4.13 +- 0.19 ms
1000 searches 0.00009999999999999999%: 3.33 +- 0.14 ms
1000 searches 0.000009999999999999999%: 3.04 +- 0.05 ms

PR with simple heuristics and warm up without large query during warmup.

500000 rectangles
Indexing: 89.04 +- 6.29 ms
1000 searches 1%: 116.74 +- 5.53 ms
1000 searches 0.1%: 21.70 +- 1.23 ms
1000 searches 0.01%: 6.70 +- 0.63 ms
1000 searches 0.001%: 3.72 +- 0.05 ms
1000 searches 0.00009999999999999999%: 3.18 +- 0.03 ms
1000 searches 0.000009999999999999999%: 2.83 +- 0.05 ms

mourner · 2025-05-22T21:29:55Z

@muendlein all right, let this sit for a few days more, I'll try to play with it a bit... As a last resort, we could just add a duplicate method, e.g. searchLarge that has the optimization, and give the user a binary choice. But ideally we'd find a way to consolidate somehow...

muendlein · 2025-08-10T11:26:06Z

@mourner As some time has passed, I'm wondering if you already had time to play around?

mourner · 2025-12-08T19:46:37Z

@muendlein sorry, just got around to looking again. Fiddled a bit — seems like performance is fine after reusing the bbox values in the check (see the merge commit), but let's measure again on your bigger benchmarks. Also, this PR needs to be updated to accommodate the change in #68 (passing leaf bbox values to filterFn).

muendlein · 2025-12-13T22:58:07Z

@mourner I have now updated the PR include the filterFn logic in the leaf function.
As for the benchmarks, I can still observe delta.

Before (main):

1000000 rectangles
1000 searches 0.00009999999999999999%: 0.94 +- 0.03 ms
500000 rectangles
1000 searches 0.00009999999999999999%: 0.80 +- 0.02 ms

After (this PR):

1000000 rectangles
1000 searches 0.00009999999999999999%: 1.06 +- 0.04 ms
500000 rectangles
1000 searches 0.00009999999999999999%: 0.89 +- 0.01 ms

mourner · 2025-12-13T23:03:58Z

@muendlein yeah, but it seems like it's a much smaller overhead than before, right? I'd love to see some more detailed benchmarks like the ones you did above.

muendlein · 2025-12-13T23:52:05Z

@mourner I'm not sure if the overhead changed much. Here is the complete benchmark:

last commit (this PR)											before this PR (main)
1000000 rectangles												1000000 rectangles
1000 searches 100%: 11324.92 +- 178.53 ms                       1000 searches 100%: 24885.40 +- 232.37 ms
1000 searches 75%: 8075.41 +- 29.56 ms                          1000 searches 75%: 15216.80 +- 61.79 ms
1000 searches 50%: 6380.42 +- 697.84 ms                         1000 searches 50%: 14353.59 +- 736.76 ms
1000 searches 25%: 3488.42 +- 8.69 ms                           1000 searches 25%: 5781.31 +- 146.28 ms
1000 searches 10%: 1313.54 +- 5.14 ms                           1000 searches 10%: 2095.42 +- 3.60 ms
1000 searches 1%: 164.30 +- 0.95 ms                             1000 searches 1%: 236.51 +- 46.24 ms
1000 searches 0.1%: 38.38 +- 0.39 ms                            1000 searches 0.1%: 49.93 +- 6.29 ms
1000 searches 0.01%: 12.77 +- 0.15 ms                           1000 searches 0.01%: 16.65 +- 0.84 ms
1000 searches 0.001%: 6.83 +- 0.12 ms                           1000 searches 0.001%: 7.04 +- 0.55 ms
1000 searches 0.00009999999999999999%: 5.25 +- 0.04 ms          1000 searches 0.00009999999999999999%: 5.76 +- 1.05 MS
                                                                
500000 rectangles                                               500000 rectangles
1000 searches 100%: 4769.16 +- 8.40 ms                          1000 searches 100%: 9832.51 +- 149.38 ms
1000 searches 75%: 4946.73 +- 21.86 ms                          1000 searches 75%: 8494.89 +- 75.32 ms
1000 searches 50%: 3336.55 +- 5.75 ms                           1000 searches 50%: 5616.14 +- 11.99 ms
1000 searches 25%: 1663.96 +- 4.10 ms                           1000 searches 25%: 2724.27 +- 31.55 ms
1000 searches 10%: 799.17 +- 4.28 ms                            1000 searches 10%: 1132.46 +- 5.08 ms
1000 searches 1%: 82.13 +- 0.14 ms                              1000 searches 1%: 93.57 +- 0.28 ms
1000 searches 0.1%: 21.04 +- 0.16 ms                            1000 searches 0.1%: 17.58 +- 0.18 ms
1000 searches 0.01%: 7.09 +- 0.24 ms                            1000 searches 0.01%: 5.30 +- 0.11 ms
1000 searches 0.001%: 4.09 +- 0.14 ms                           1000 searches 0.001%: 3.14 +- 0.13 ms
1000 searches 0.00009999999999999999%: 3.24 +- 0.06 ms          1000 searches 0.00009999999999999999%: 2.58 +- 0.10 ms
                                                                
250000 rectangles                                               250000 rectangles
1000 searches 100%: 2228.57 +- 4.88 ms                          1000 searches 100%: 4691.57 +- 12.56 ms
1000 searches 75%: 2361.86 +- 14.43 ms                          1000 searches 75%: 4056.02 +- 9.81 ms
1000 searches 50%: 1615.99 +- 10.03 ms                          1000 searches 50%: 2685.84 +- 5.27 ms
1000 searches 25%: 827.36 +- 15.38 ms                           1000 searches 25%: 1156.71 +- 9.19 ms
1000 searches 10%: 372.08 +- 3.20 ms                            1000 searches 10%: 473.61 +- 2.87 ms
1000 searches 1%: 45.24 +- 0.14 ms                              1000 searches 1%: 44.03 +- 0.47 ms
1000 searches 0.1%: 9.97 +- 0.18 ms                             1000 searches 0.1%: 7.76 +- 0.19 ms
1000 searches 0.01%: 4.29 +- 0.13 ms                            1000 searches 0.01%: 3.19 +- 0.29 ms
1000 searches 0.001%: 2.47 +- 0.12 ms                           1000 searches 0.001%: 2.00 +- 0.26 ms
1000 searches 0.00009999999999999999%: 2.15 +- 0.09 ms          1000 searches 0.00009999999999999999%: 1.65 +- 0.11 MS
                                                                
100000 rectangles                                               100000 rectangles
1000 searches 100%: 970.27 +- 8.02 ms                           1000 searches 100%: 1620.02 +- 6.11 ms
1000 searches 75%: 1027.92 +- 5.92 ms                           1000 searches 75%: 1384.78 +- 14.32 ms
1000 searches 50%: 727.50 +- 3.89 ms                            1000 searches 50%: 855.47 +- 7.68 ms
1000 searches 25%: 339.16 +- 1.56 ms                            1000 searches 25%: 383.51 +- 1.37 ms
1000 searches 10%: 119.96 +- 9.78 ms                            1000 searches 10%: 112.73 +- 0.79 ms
1000 searches 1%: 27.87 +- 0.41 ms                              1000 searches 1%: 14.05 +- 0.13 ms
1000 searches 0.1%: 7.38 +- 0.62 ms                             1000 searches 0.1%: 3.71 +- 0.46 ms
1000 searches 0.01%: 2.17 +- 0.07 ms                            1000 searches 0.01%: 1.46 +- 0.02 ms
1000 searches 0.001%: 1.70 +- 0.09 ms                           1000 searches 0.001%: 1.11 +- 0.04 ms
1000 searches 0.00009999999999999999%: 1.41 +- 0.03 ms          1000 searches 0.00009999999999999999%: 0.97 +- 0.02 MS
                                                                
50000 rectangles                                                50000 rectangles
1000 searches 100%: 529.18 +- 44.44 ms                          1000 searches 100%: 564.04 +- 1.55 ms
1000 searches 75%: 603.69 +- 61.97 ms                           1000 searches 75%: 530.91 +- 1.03 ms
1000 searches 50%: 404.61 +- 48.37 ms                           1000 searches 50%: 417.14 +- 63.84 ms
1000 searches 25%: 113.42 +- 4.48 ms                            1000 searches 25%: 113.89 +- 1.12 ms
1000 searches 10%: 61.11 +- 4.47 ms                             1000 searches 10%: 50.63 +- 0.18 ms
1000 searches 1%: 13.00 +- 0.22 ms                              1000 searches 1%: 8.07 +- 0.15 ms
1000 searches 0.1%: 2.68 +- 0.07 ms                             1000 searches 0.1%: 2.13 +- 0.13 ms
1000 searches 0.01%: 1.37 +- 0.01 ms                            1000 searches 0.01%: 1.10 +- 0.04 ms
1000 searches 0.001%: 1.40 +- 0.11 ms                           1000 searches 0.001%: 0.85 +- 0.05 ms
1000 searches 0.00009999999999999999%: 1.09 +- 0.13 MS          1000 searches 0.00009999999999999999%: 0.75 +- 0.01 ms

muendlein · 2025-12-21T11:47:29Z

@mourner I have now switched to a recursive implementation which seems to close the gap significantly. For larger datasets it is actually consistently ahead. I guess this should be good enough, what do you think?

last commit																before (main)

1000000 rectangles														1000000 rectangles	
1000 searches 100%: 12652.25 +- 223.57 ms                               1000 searches 100%: 24885.40 +- 232.37 ms
1000 searches 75%: 8589.92 +- 105.95 ms                                 1000 searches 75%: 15216.80 +- 61.79 ms
1000 searches 50%: 5873.25 +- 10.20 ms                                  1000 searches 50%: 14353.59 +- 736.76 ms
1000 searches 25%: 3896.11 +- 451.26 ms                                 1000 searches 25%: 5781.31 +- 146.28 ms
1000 searches 10%: 1287.75 +- 11.16 ms                                  1000 searches 10%: 2095.42 +- 3.60 ms
1000 searches 1%: 126.53 +- 0.73 ms                                     1000 searches 1%: 236.51 +- 46.24 ms
1000 searches 0.1%: 27.02 +- 0.17 ms                                    1000 searches 0.1%: 49.93 +- 6.29 ms
1000 searches 0.01%: 9.51 +- 0.68 ms                                    1000 searches 0.01%: 16.65 +- 0.84 ms
1000 searches 0.001%: 5.18 +- 0.09 ms                                   1000 searches 0.001%: 7.04 +- 0.55 ms
1000 searches 0.00009999999999999999%: 4.43 +- 0.44 ms                  1000 searches 0.00009999999999999999%: 5.76 +- 1.05 ms
                                                                        
500000 rectangles                                                       500000 rectangles
1000 searches 100%: 5589.39 +- 104.73 ms                                1000 searches 100%: 9832.51 +- 149.38 ms
1000 searches 75%: 5442.32 +- 56.25 ms                                  1000 searches 75%: 8494.89 +- 75.32 ms
1000 searches 50%: 3640.05 +- 21.26 ms                                  1000 searches 50%: 5616.14 +- 11.99 ms
1000 searches 25%: 1732.75 +- 11.41 ms                                  1000 searches 25%: 2724.27 +- 31.55 ms
1000 searches 10%: 794.24 +- 3.78 ms                                    1000 searches 10%: 1132.46 +- 5.08 ms
1000 searches 1%: 65.03 +- 0.19 ms                                      1000 searches 1%: 93.57 +- 0.28 ms
1000 searches 0.1%: 16.09 +- 0.24 ms                                    1000 searches 0.1%: 17.58 +- 0.18 ms
1000 searches 0.01%: 5.42 +- 0.14 ms                                    1000 searches 0.01%: 5.30 +- 0.11 ms
1000 searches 0.001%: 3.31 +- 0.09 ms                                   1000 searches 0.001%: 3.14 +- 0.13 ms
1000 searches 0.00009999999999999999%: 2.74 +- 0.07 ms                  1000 searches 0.00009999999999999999%: 2.58 +- 0.10 ms
                                                                        
250000 rectangles                                                       250000 rectangles
1000 searches 100%: 2508.79 +- 9.06 ms                                  1000 searches 100%: 4691.57 +- 12.56 ms
1000 searches 75%: 2522.42 +- 6.15 ms                                   1000 searches 75%: 4056.02 +- 9.81 ms
1000 searches 50%: 1719.10 +- 8.71 ms                                   1000 searches 50%: 2685.84 +- 5.27 ms
1000 searches 25%: 833.99 +- 10.90 ms                                   1000 searches 25%: 1156.71 +- 9.19 ms
1000 searches 10%: 351.27 +- 1.40 ms                                    1000 searches 10%: 473.61 +- 2.87 ms
1000 searches 1%: 38.45 +- 0.17 ms                                      1000 searches 1%: 44.03 +- 0.47 ms
1000 searches 0.1%: 8.85 +- 0.12 ms                                     1000 searches 0.1%: 7.76 +- 0.19 ms
1000 searches 0.01%: 3.34 +- 0.10 ms                                    1000 searches 0.01%: 3.19 +- 0.29 ms
1000 searches 0.001%: 2.13 +- 0.02 ms                                   1000 searches 0.001%: 2.00 +- 0.26 ms
1000 searches 0.00009999999999999999%: 2.06 +- 0.07 ms                  1000 searches 0.00009999999999999999%: 1.65 +- 0.11 ms
                                                                        
100000 rectangles                                                       100000 rectangles
1000 searches 100%: 1076.51 +- 4.82 ms                                  1000 searches 100%: 1620.02 +- 6.11 ms
1000 searches 75%: 1113.58 +- 4.89 ms                                   1000 searches 75%: 1384.78 +- 14.32 ms
1000 searches 50%: 760.27 +- 11.03 ms                                   1000 searches 50%: 855.47 +- 7.68 ms
1000 searches 25%: 341.19 +- 1.99 ms                                    1000 searches 25%: 383.51 +- 1.37 ms
1000 searches 10%: 101.56 +- 1.52 ms                                    1000 searches 10%: 112.73 +- 0.79 ms
1000 searches 1%: 16.05 +- 0.11 ms                                      1000 searches 1%: 14.05 +- 0.13 ms
1000 searches 0.1%: 5.21 +- 0.34 ms                                     1000 searches 0.1%: 3.71 +- 0.46 ms
1000 searches 0.01%: 1.91 +- 0.02 ms                                    1000 searches 0.01%: 1.46 +- 0.02 ms
1000 searches 0.001%: 1.38 +- 0.01 ms                                   1000 searches 0.001%: 1.11 +- 0.04 ms
1000 searches 0.00009999999999999999%: 1.27 +- 0.01 ms                  1000 searches 0.00009999999999999999%: 0.97 +- 0.02 ms
                                                                        
50000 rectangles                                                        50000 rectangles
1000 searches 100%: 495.78 +- 2.67 ms                                   1000 searches 100%: 564.04 +- 1.55 ms
1000 searches 75%: 503.01 +- 52.58 ms                                   1000 searches 75%: 530.91 +- 1.03 ms
1000 searches 50%: 321.75 +- 2.92 ms                                    1000 searches 50%: 417.14 +- 63.84 ms
1000 searches 25%: 106.41 +- 1.63 ms                                    1000 searches 25%: 113.89 +- 1.12 ms
1000 searches 10%: 50.81 +- 0.38 ms                                     1000 searches 10%: 50.63 +- 0.18 ms
1000 searches 1%: 9.96 +- 0.17 ms                                       1000 searches 1%: 8.07 +- 0.15 ms
1000 searches 0.1%: 2.89 +- 0.48 ms                                     1000 searches 0.1%: 2.13 +- 0.13 ms
1000 searches 0.01%: 1.46 +- 0.03 ms                                    1000 searches 0.01%: 1.10 +- 0.04 ms
1000 searches 0.001%: 1.08 +- 0.02 ms                                   1000 searches 0.001%: 0.85 +- 0.05 ms
1000 searches 0.00009999999999999999%: 0.98 +- 0.01 ms                  1000 searches 0.00009999999999999999%: 0.75 +- 0.01 ms

mourner · 2025-12-21T19:26:51Z

Excellent! A few more nits:

Let's move addAllLeavesOfNode to a method, similar to _searchRecursive just for code clarity / simplicity (if it doesn't degrade performance).
Can you share the more extensive benchmark you used to generate the results above?

muendlein · 2025-12-21T21:46:05Z

@mourner Moved the function now (did not observe performance degradation).
Here is the code for the extensive benchmarking:

import Flatbush from './index.js';


const nodeSize = 16;
const boxSpace = 100;
const dataBoxSize = 1;

function addRandomBox(arr, boxSize) {
    const x = Math.random() * (boxSpace - boxSize);
    const y = Math.random() * (boxSpace - boxSize);
    const x2 = x + boxSize;
    const y2 = y + boxSize;
    arr.push(x, y, x2, y2);
}

function calcTimeStats(array) {
    const n = array.length;
    const mean = array.reduce((a, b) => a + b) / n;
    const std = Math.sqrt(array.map(x => Math.pow(x - mean, 2)).reduce((a, b) => a + b) / n);
    return `${mean.toFixed(2)} +- ${std.toFixed(2)} ms`;
}

function createBush(_N) {
    const coords = [];
    for (let i = 0; i < _N; i++) addRandomBox(coords, dataBoxSize);
    const index = new Flatbush(_N, nodeSize);
    for (let i = 0; i < coords.length; i += 4) {
        index.add(
            coords[i],
            coords[i + 1],
            coords[i + 2],
            coords[i + 3]);
    }
    index.finish();
    return index;
}

function createSearchBoxes(area, Ksearch) {
    const boxes = [];
    for (let i = 0; i < Ksearch; i++) {
        addRandomBox(boxes, boxSpace * Math.sqrt(area));
    }
    return boxes;
}

const box_1_000_000 = createBush(1000000);
const box_500_000 = createBush(500000);
const box_250_000 = createBush(250000);
const box_100_000 = createBush(100000);
const box_50_000 = createBush(50000);

function benchBoxSearchBush(_index, boxes) {
    const start = performance.now();
    for (let i = 0; i < boxes.length; i += 4) {
        _index.search(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]);
    }
    return performance.now() - start;
}

function benchSearchfBush(_index, area, Ksearch) {
    const searchBoxes = createSearchBoxes(area, Ksearch);
    return benchBoxSearchBush(_index, searchBoxes);
}

function benchBoxBush(box, repetitions, Ksearch, area, warmup) {
    const timesSearch = [];
    for (let i = 0; i < repetitions + 1; i++) {
        const tSearch = benchSearchfBush(box, area, Ksearch);
        timesSearch.push(tSearch);
    }

    if (warmup) {
        return;
    }

    console.log(`${Ksearch} searches ${area * 100}%: ${calcTimeStats(timesSearch)}`);
}

console.log(`node size: ${nodeSize}`);

const repetitions = 5;
const KSearch = 1000;

console.log('\n1000000 rectangles');
benchBoxBush(box_1_000_000, 1, KSearch, 1.0, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 1.0, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.75, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.75, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.5, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.5, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.25, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.25, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.1, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.1, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.01, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.01, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.001, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.001, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.0001, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.0001, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.00001, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.00001, false);
benchBoxBush(box_1_000_000,  1, KSearch, 0.000001, true);
benchBoxBush(box_1_000_000,  repetitions, KSearch, 0.000001, false);

console.log('\n500000 rectangles');
benchBoxBush(box_500_000,  1, KSearch, 1.0, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 1.0, false);
benchBoxBush(box_500_000,  1, KSearch, 0.75, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.75, false);
benchBoxBush(box_500_000,  1, KSearch, 0.5, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.5, false);
benchBoxBush(box_500_000,  1, KSearch, 0.25, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.25, false);
benchBoxBush(box_500_000,  1, KSearch, 0.1, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.1, false);
benchBoxBush(box_500_000,  1, KSearch, 0.01, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.01, false);
benchBoxBush(box_500_000,  1, KSearch, 0.001, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.001, false);
benchBoxBush(box_500_000,  1, KSearch, 0.0001, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.0001, false);
benchBoxBush(box_500_000,  1, KSearch, 0.00001, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.00001, false);
benchBoxBush(box_500_000,  1, KSearch, 0.000001, true);
benchBoxBush(box_500_000,  repetitions, KSearch, 0.000001, false);

console.log('\n250000 rectangles');
benchBoxBush(box_250_000,  repetitions, KSearch, 1.0, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 1.0, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.75, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.75, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.5, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.5, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.25, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.25, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.1, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.1, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.01, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.01, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.001, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.001, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.0001, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.0001, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.00001, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.00001, false);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.000001, true);
benchBoxBush(box_250_000,  repetitions, KSearch, 0.000001, false);

console.log('\n100000 rectangles');
benchBoxBush(box_100_000,  repetitions, KSearch, 1.0, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 1.0, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.75, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.75, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.5, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.5, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.25, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.25, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.1, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.1, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.01, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.01, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.001, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.001, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.0001, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.0001, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.00001, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.00001, false);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.000001, true);
benchBoxBush(box_100_000,  repetitions, KSearch, 0.000001, false);

console.log('\n50000 rectangles');
benchBoxBush(box_50_000,  repetitions, KSearch, 1.0, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 1.0, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.75, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.75, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.5, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.5, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.25, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.25, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.1, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.1, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.01, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.01, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.001, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.001, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.0001, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.0001, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.00001, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.00001, false);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.000001, true);
benchBoxBush(box_50_000,  repetitions, KSearch, 0.000001, false);

muendlein · 2026-01-18T22:43:01Z

@mourner Is there anything else needed to get this merged?

mourner · 2026-03-13T10:57:25Z

@muendlein sorry for dropping the ball on this, it's been a very tough winter for me. I'll take another look soon — I've been hesitating because while the PR improved a lot with iterations, it's still a notable regression for the most common use case (arguably, most performance-sensitive apps using flatbush depend on many small queries rather than few big ones). Will take one last look into whether we can get it closer to original performance for that case.

muendlein · 2026-03-15T20:16:26Z

@mourner Thank you for coming back to me and hopefully things are getting better with spring around the corner!

I see the point with smaller queries being the more common use case. But for those cases isn't the number of items also large (like >1M)? So far the benchmarking indicates that the regression is mainly present for small queries on small datasets.

muendlein · 2026-03-15T21:48:39Z

I just repeated the benchmark (with more focus on smaller queries) and noticed that the most recent commit (function moved to class improved performance a bit). Regression is now only observable with <=250k items.

last commit							                            before this PR
                                                                
1000000 rectangles                                              1000000 rectangles
1000 searches 1%: 120.53 +- 2.07 ms                             1000 searches 1%: 209.84 +- 21.90 ms
1000 searches 0.1%: 26.89 +- 1.06 ms                            1000 searches 0.1%: 32.67 +- 0.24 ms
1000 searches 0.01%: 9.42 +- 0.47 ms                            1000 searches 0.01%: 10.15 +- 0.24 ms
1000 searches 0.001%: 5.10 +- 0.13 ms                           1000 searches 0.001%: 5.29 +- 0.11 ms
1000 searches 0.00009999999999999999%: 4.03 +- 0.06 ms          1000 searches 0.00009999999999999999%: 4.13 +- 0.11 ms
1000 searches 0.000009999999999999999%: 3.82 +- 0.17 ms         1000 searches 0.000009999999999999999%: 3.91 +- 0.15 ms
1000 searches 0.000001%: 3.70 +- 0.07 ms                        1000 searches 0.000001%: 3.84 +- 0.15 ms
1000 searches 1.0000000000000001e-7%: 3.70 +- 0.08 ms           1000 searches 1.0000000000000001e-7%: 3.68 +- 0.06 ms
1000 searches 1e-8%: 3.74 +- 0.13 ms                            1000 searches 1e-8%: 3.94 +- 0.11 ms
                                                                
500000 rectangles                                               500000 rectangles
1000 searches 1%: 63.38 +- 2.34 ms                              1000 searches 1%: 93.26 +- 3.28 ms
1000 searches 0.1%: 15.37 +- 0.13 ms                            1000 searches 0.1%: 17.05 +- 0.20 ms
1000 searches 0.01%: 5.21 +- 0.10 ms                            1000 searches 0.01%: 5.31 +- 0.12 ms
1000 searches 0.001%: 3.17 +- 0.08 ms                           1000 searches 0.001%: 3.18 +- 0.08 ms
1000 searches 0.00009999999999999999%: 2.62 +- 0.06 ms          1000 searches 0.00009999999999999999%: 2.74 +- 0.26 ms
1000 searches 0.000009999999999999999%: 2.46 +- 0.04 ms         1000 searches 0.000009999999999999999%: 2.43 +- 0.11 ms
1000 searches 0.000001%: 2.43 +- 0.06 ms                        1000 searches 0.000001%: 2.40 +- 0.08 ms
1000 searches 1.0000000000000001e-7%: 2.42 +- 0.03 ms           1000 searches 1.0000000000000001e-7%: 2.52 +- 0.17 ms
1000 searches 1e-8%: 2.49 +- 0.14 ms                            1000 searches 1e-8%: 2.46 +- 0.08 ms
                                                                
250000 rectangles                                               250000 rectangles
1000 searches 1%: 38.09 +- 1.50 ms                              1000 searches 1%: 42.72 +- 0.92 ms
1000 searches 0.1%: 8.63 +- 0.13 ms                             1000 searches 0.1%: 7.62 +- 0.16 ms
1000 searches 0.01%: 3.25 +- 0.06 ms                            1000 searches 0.01%: 2.81 +- 0.09 ms
1000 searches 0.001%: 2.23 +- 0.11 ms                           1000 searches 0.001%: 1.85 +- 0.11 ms
1000 searches 0.00009999999999999999%: 1.88 +- 0.14 ms          1000 searches 0.00009999999999999999%: 1.63 +- 0.03 ms
1000 searches 0.000009999999999999999%: 1.77 +- 0.05 ms         1000 searches 0.000009999999999999999%: 1.51 +- 0.03 ms
1000 searches 0.000001%: 1.69 +- 0.01 ms                        1000 searches 0.000001%: 1.46 +- 0.03 ms
1000 searches 1.0000000000000001e-7%: 1.74 +- 0.06 ms           1000 searches 1.0000000000000001e-7%: 1.44 +- 0.03 ms
1000 searches 1e-8%: 1.79 +- 0.02 ms                            1000 searches 1e-8%: 1.47 +- 0.12 ms
                                                                
100000 rectangles                                               100000 rectangles
1000 searches 1%: 15.63 +- 0.12 ms                              1000 searches 1%: 13.46 +- 0.17 ms
1000 searches 0.1%: 4.49 +- 0.21 ms                             1000 searches 0.1%: 3.31 +- 0.16 ms
1000 searches 0.01%: 1.90 +- 0.07 ms                            1000 searches 0.01%: 1.40 +- 0.01 ms
1000 searches 0.001%: 1.37 +- 0.01 ms                           1000 searches 0.001%: 1.06 +- 0.01 ms
1000 searches 0.00009999999999999999%: 1.28 +- 0.02 ms          1000 searches 0.00009999999999999999%: 1.02 +- 0.07 ms
1000 searches 0.000009999999999999999%: 1.19 +- 0.03 ms         1000 searches 0.000009999999999999999%: 0.92 +- 0.01 ms
1000 searches 0.000001%: 1.17 +- 0.02 ms                        1000 searches 0.000001%: 0.91 +- 0.03 ms
1000 searches 1.0000000000000001e-7%: 1.20 +- 0.03 ms           1000 searches 1.0000000000000001e-7%: 0.97 +- 0.05 ms
1000 searches 1e-8%: 1.17 +- 0.02 ms                            1000 searches 1e-8%: 0.93 +- 0.03 ms
                                                                
50000 rectangles                                                50000 rectangles
1000 searches 1%: 9.66 +- 0.16 ms                               1000 searches 1%: 8.11 +- 0.16 ms
1000 searches 0.1%: 2.71 +- 0.04 ms                             1000 searches 0.1%: 2.05 +- 0.05 ms
1000 searches 0.01%: 1.37 +- 0.03 ms                            1000 searches 0.01%: 1.09 +- 0.00 ms
1000 searches 0.001%: 1.06 +- 0.01 ms                           1000 searches 0.001%: 0.87 +- 0.05 ms
1000 searches 0.00009999999999999999%: 1.00 +- 0.04 ms          1000 searches 0.00009999999999999999%: 0.78 +- 0.01 ms
1000 searches 0.000009999999999999999%: 0.95 +- 0.01 ms         1000 searches 0.000009999999999999999%: 0.78 +- 0.01 ms
1000 searches 0.000001%: 0.95 +- 0.02 ms                        1000 searches 0.000001%: 0.75 +- 0.01 ms
1000 searches 1.0000000000000001e-7%: 0.94 +- 0.01 ms           1000 searches 1.0000000000000001e-7%: 0.75 +- 0.01 ms
1000 searches 1e-8%: 0.94 +- 0.01 ms                            1000 searches 1e-8%: 0.75 +- 0.01 ms

mourner · 2026-04-22T08:14:19Z

@muendlein apologies again for taking so long. I just fed the PR to Codex 5.4, and it suggested an interesting idea: only apply the full bbox optimization on higher leaf levels. It claims this eliminates most of the small query regression while keeping the large query win. Can you try it out? Here's the diff. Also let's rebase on main and resolve conflicts (there were some minor typing changes there).

diff --git a/index.js b/index.js
index fb8f602..dc3068d 100644
--- a/index.js
+++ b/index.js
@@ -223,7 +223,7 @@ export default class Flatbush {
         /** @type number[] | undefined */
         const results = [];
 
-        this._searchRecursive(minX, minY, maxX, maxY, results, nodeIndex, filterFn);
+        this._searchRecursive(minX, minY, maxX, maxY, results, nodeIndex, this._levelBounds.length - 1, filterFn);
         return results;
     }
 
@@ -235,10 +235,11 @@ export default class Flatbush {
      * @param {number} maxY
      * @param {number[]} results
      * @param {number} nodeIndex
+     * @param {number} level
      * @param {(index: number, x0: number, y0: number, x1: number, y1: number) => boolean} [filterFn] An optional function that is called on every found item; if supplied, only items for which this function returns true will be included in the results array.
      * @returns {void}
      */
-    _searchRecursive(minX, minY, maxX, maxY, results, nodeIndex, filterFn) {
+    _searchRecursive(minX, minY, maxX, maxY, results, nodeIndex, level, filterFn) {
         const end = Math.min(nodeIndex + this.nodeSize * 4, upperBound(nodeIndex, this._levelBounds));
 
         // search through child nodes
@@ -255,12 +256,12 @@ export default class Flatbush {
 
             const index = this._indices[pos >> 2] | 0;
 
-            if (nodeIndex >= this.numItems * 4) {
+            if (level > 0) {
                 // check if node bbox is completely inside query bbox
-                if (minX <= x0 && minY <= y0 && maxX >= x1 && maxY >= y1) {
+                if (level > 1 && minX <= x0 && minY <= y0 && maxX >= x1 && maxY >= y1) {
                     this._addAllLeavesOfNode(results, pos, filterFn);
                 } else {
-                    this._searchRecursive(minX, minY, maxX, maxY, results, index, filterFn);
+                    this._searchRecursive(minX, minY, maxX, maxY, results, index, level - 1, filterFn);
                 }
             } else if (filterFn === undefined || filterFn(index, x0, y0, x1, y1)) {
                 results.push(index); // leaf item

# Conflicts: # index.js

muendlein added 2 commits May 10, 2025 17:09

optimize search if node bbox is inside query box

eaf4e33

fixed naming

cbc136d

mourner approved these changes May 10, 2025

View reviewed changes

Comment thread index.js Outdated

kylebarron mentioned this pull request May 12, 2025

Upstream: Enhance search performance georust/geo-index#124

Open

reorder if statements

d00026b

split off functions

d1b015f

inline addLeafSegment

2e8995f

microoptimization

fb78a2e

mourner added 2 commits December 8, 2025 21:34

Merge branch 'main' into enhance_search_performance

465c18a

fixup

f54cd2b

muendlein added 2 commits December 11, 2025 22:25

apply filterFn to leaves

ae20359

clean up types

626b2cf

switch to recursive search

83033c1

move add leaves to class

796030d

clean up

5e3775c

Merge branch 'main' into enhance_search_performance

3565454

# Conflicts: # index.js

Conversation

muendlein commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mourner commented May 10, 2025

Uh oh!

muendlein commented May 11, 2025

Uh oh!

muendlein commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented May 18, 2025

Uh oh!

muendlein commented May 18, 2025

Uh oh!

mourner commented May 18, 2025

Uh oh!

muendlein commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented May 18, 2025

Uh oh!

muendlein commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muendlein commented May 19, 2025

Uh oh!

mourner commented May 20, 2025

Uh oh!

muendlein commented May 20, 2025

Uh oh!

mourner commented May 20, 2025

Uh oh!

muendlein commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented May 21, 2025

Uh oh!

muendlein commented May 21, 2025

Uh oh!

leeoniya commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muendlein commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented May 22, 2025

Uh oh!

muendlein commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented May 22, 2025

Uh oh!

muendlein commented Aug 10, 2025

Uh oh!

mourner commented Dec 8, 2025

Uh oh!

muendlein commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mourner commented Dec 13, 2025

Uh oh!

muendlein commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muendlein commented Dec 21, 2025

Uh oh!

mourner commented Dec 21, 2025

Uh oh!

muendlein commented Dec 21, 2025

Uh oh!

muendlein commented Jan 18, 2026

Uh oh!

mourner commented Mar 13, 2026

Uh oh!

muendlein commented Mar 15, 2026

muendlein commented May 10, 2025 •

edited

Loading

muendlein commented May 13, 2025 •

edited

Loading

muendlein commented May 18, 2025 •

edited

Loading

muendlein commented May 18, 2025 •

edited

Loading

muendlein commented May 21, 2025 •

edited

Loading

leeoniya commented May 22, 2025 •

edited

Loading

muendlein commented May 22, 2025 •

edited

Loading

muendlein commented May 22, 2025 •

edited

Loading

muendlein commented Dec 13, 2025 •

edited

Loading

muendlein commented Dec 13, 2025 •

edited

Loading