Skip to content

feat: add periodic boundary condition queries for float trees#297

Open
helmutcarter wants to merge 5 commits intosdd:masterfrom
helmutcarter:master
Open

feat: add periodic boundary condition queries for float trees#297
helmutcarter wants to merge 5 commits intosdd:masterfrom
helmutcarter:master

Conversation

@helmutcarter
Copy link
Copy Markdown

@helmutcarter helmutcarter commented Mar 25, 2026

Resolves #34 by adding periodic boundary condition support for float-tree queries on KdTree and ImmutableKdTree.

New periodic APIs:

  • nearest_one_periodic
  • nearest_n_periodic
  • within_periodic
  • within_unsorted_periodic
  • nearest_n_within_periodic

It also adds:

  • runnable examples in examples/
  • appropriate entries in README and docs/
  • periodic query tests

Implementation

This is a correctness-first implementation. It is pretty slow.

Periodic queries work by evaluating wrapped query images, reusing the existing non-periodic query logic, then
merging duplicate hits and keeping the minimum distance per item.

Contract

Current assumptions:

  • box_size[i] > 0
  • points are stored in a principal cell
  • queries are also supplied in that same cell, e.g. this PR does not add automatic query wrapping.

Performance

The performance penalty is large, and grows quickly with increasing number of dimensions. With the current naive approach,
periodic query complexity is O(3^K). I think I know how to reduce this to O(2^K), but a pruning-aware periodic search may
obviate the need for optimizing the number of queries of periodic images.

Benchmark:

  • mutable 2D nearest_one: ~72x slower
  • immutable 2D nearest_one: ~80x slower
  • mutable 3D nearest_one: ~623x slower
  • immutable 3D nearest_one: ~630x slower

@sdd
Copy link
Copy Markdown
Owner

sdd commented Apr 13, 2026

Hi @helmutcarter. Thanks for this PR - the feature request / issue on which it is based has been open for a number of years and I know there are certainly others that are keen to see this functionality!

I've been travelling quite a bit these past couple of weeks so apologies for the delay in looking at this. It is quite a large PR so I'll need a bit of time to review. At first glance it looks as though there would be no impact to other existing users since the new PBC-related code does not touch the underlying data structures and is contained within the new query methods, which is good, but I'd also like to spend a bit of time this week to see if I can improve the performance.

Thanks again!

@helmutcarter
Copy link
Copy Markdown
Author

Hi @sdd thanks for taking a look! I was thinking more about the API this weekend and I was wondering if it would be better to have a periodic flag in existing functions, instead of creating new periodic versions of them. I don't know much about software design or best practices here, so please let me know what you think.

Regarding performance, this is definitely a naive first-pass implementation. I had a few thoughts on improving performance, but I'm curious to see what you come up with!

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 98.34213% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.56%. Comparing base (11870f8) to head (f064893).

Files with missing lines Patch % Lines
src/immutable/float/query/nearest_n_within.rs 96.29% 4 Missing and 2 partials ⚠️
src/float/query/nearest_n_within.rs 97.58% 4 Missing and 1 partial ⚠️
src/float/query/within_unsorted.rs 97.14% 4 Missing and 1 partial ⚠️
src/immutable/float/query/within_unsorted.rs 96.64% 4 Missing and 1 partial ⚠️
src/float/query/nearest_n.rs 99.35% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #297      +/-   ##
==========================================
+ Coverage   94.88%   95.56%   +0.68%     
==========================================
  Files          54       54              
  Lines        5705     7032    +1327     
  Branches     5705     7032    +1327     
==========================================
+ Hits         5413     6720    +1307     
- Misses        274      289      +15     
- Partials       18       23       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

.map(|(item, distance)| NearestNeighbour { distance, item })
.collect();

results.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get better performance here by only sorting the results as far as qty. We don't care if the items beyond qty are sorted - only that their distance is > the closest quantity items. Also the unstable sort variants are more appropriate here as we don't care about preserving order of items with the same distance from the raw unsorted results.

You can do this by using a variant of select_nth_unstable. This ensures that the item in position n in the result is in it's final sorted position.

Older versions of select_nth_unstable had the side-effect that the "closer" side of the result array was also sorted after select_nth_unstable exited. I'm not 100% sure that this is still the case with the new ipnsort algorithm that Rust's standard library uses, so we may need to sort the "closer" side again after applying select_nth_unstable.

Something like this should be an improvement:

results.select_nth_unstable_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());

results.truncate(qty);

// this step may not be needed if select_nth_unstable_by
// has the side-effect of sorting the closer side
results.sort_unstable_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());

results

{
if axis == K {
for candidate in self.nearest_n::<D>(wrapped_query, qty) {
best_by_item
Copy link
Copy Markdown
Owner

@sdd sdd Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this HashMap based approach is causing some performance issues here.

An alternative we could try here would be:

  • perform all 3^K queries but keep them as-is, ie simple lists of NearestNeighbour objects
  • concatenate the lists and then sort unstable by item and then dist, so we get all results from repeated items adjacent to each other, with multiple results for the same item in ascending distance order
  • walk across the vec, copying later items on top of earlier items to eliminate the more distant images for the same item. Something like this (I've not tested this, or even tried to compile it!):
results.sort_unstable_by_key(|a|a.item); // EDIT: this closure needs to use sort_by, not sort_by_key, and to cmp by distance when a.item == b.item

let mut from_idx = 1;
let mut to_idx = 1;
let curr_item = result[0].item;
while(from_idx < result.len()) {
    if result[to_idx].item != curr_item {
        result[from_idx] = result[to_idx];
        curr_item = result[to_idx].item;
        to_idx += 1;
    }
    from_idx += 1;
}
results.truncate(to_idx);

Then try the same as the previous comment:

results.select_nth_unstable_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());

results.truncate(qty);

// this step may not be needed if select_nth_unstable_by
// has the side-effect of sorting the closer side
results.sort_unstable_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());

results

I don't know for certain but my hunch is that this would be faster than the hashmap-based approach. Certainly worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Periodic Boundary Conditions (Third time's a charm)

2 participants