Skip to content

Optimize inverse index mapping in groups.py (Fixes Issue #3387)#5252

Open
Ayush-Agarwal-G1THUB wants to merge 3 commits intoMDAnalysis:developfrom
Ayush-Agarwal-G1THUB:issue3387-inverse-map-optimization
Open

Optimize inverse index mapping in groups.py (Fixes Issue #3387)#5252
Ayush-Agarwal-G1THUB wants to merge 3 commits intoMDAnalysis:developfrom
Ayush-Agarwal-G1THUB:issue3387-inverse-map-optimization

Conversation

@Ayush-Agarwal-G1THUB
Copy link

@Ayush-Agarwal-G1THUB Ayush-Agarwal-G1THUB commented Feb 26, 2026

Fixes #3387

Summary

This PR replaces the O(n²) inverse index reconstruction logic in groups.py with an O(n) implementation written in Cython.

The previous implementation used np.where inside a loop over unique indices, leading to quadratic complexity. The approach in this PR uses a dictionary-based lookup implemented in lib/_cutil.pyx, where the indices of the unique values in indices array are added to a map. Then self.ix array is iterated over only once and the map is consulted to get the index from the uniqe value array.
This means that the self.ix array is traversed in a single pass, making it linear time complexity.

Changes made in this Pull Request:

  • Implemented a new cython function inverse_int_index() in package/MDAnalysis/lib/_cutil.pyx
  • Replaced the previous Python O(n²) logic in groups.py with a call to the new cython function
  • Benchmarked performance improvement

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

Benchmark

For an array of size 1,000,000 with ~5000 unique values, following speedup was observed :
Python: ~0.41s
Cython: ~0.11s
Speedup: ~3.64 times
(these values are from local testing)

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)
  • LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5252.org.readthedocs.build/en/5252/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.85%. Comparing base (900d20b) to head (7724ba3).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5252      +/-   ##
===========================================
+ Coverage    93.83%   93.85%   +0.01%     
===========================================
  Files          180      180              
  Lines        22473    22471       -2     
  Branches      3189     3188       -1     
===========================================
+ Hits         21088    21090       +2     
+ Misses         923      920       -3     
+ Partials       462      461       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Ayush-Agarwal-G1THUB
Copy link
Author

Hello! To any reviewers reading this

This is my first contribution at this scale, so I’d really appreciate feedback on both the implementation and whether this PR follows the contributor guidelines.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fast cythonized inverse array of unsorted, unique indices

1 participant