Merge upstream master with metagraph compatibility by adamant-pwn · Pull Request #86 · jermp/sshash

adamant-pwn · 2026-02-16T13:05:54Z

No description provided.

This merge brings in all upstream changes from jermp/sshash, including: - Major refactoring with dictionary_builder class - New parse_file.cpp, compute_minimizer_tuples.cpp, build_sparse_and_skew_index.cpp - Updated benchmarks and scripts - cityhash integration - Many bug fixes and improvements Notable preserved ratschlab changes: - SSHASH_BUILD_EXECUTABLES option in CMakeLists.txt (commit 64d8894) Updated pthash submodule to upstream version.

…patibility Changes for metagraph integration: 1. Wrap cityhash in namespace to avoid typedef pollution - cityhash.hpp and cityhash.cpp: Wrapped in 'cityhash' namespace - Prevents conflicts with rollinghashcpp uint64 typedef - hash_util.hpp: Use cityhash::CityHash128WithSeed instead of static CityMurmur - tools/query.cpp: Changed uint64 to uint64_t (standard type) 2. Add public accessor methods to dictionary - Added kmer_type typedef for easier template parameter extraction - Added strings() accessor returning m_spss.strings bit vector - Added strings_offsets() accessor returning m_spss.strings_offsets - Moved strings_offsets() next to string_offsets() for better organization 3. Build system changes - CMakeLists.txt: Added cityhash.cpp to SSHASH_SOURCES - CMakeLists.txt: Link sshash executable against sshash_static (includes cityhash) 4. Update pthash submodule - Now tracking ratschlab/pthash master with compute_empirical_entropy fix - .gitmodules: Updated URL to point to ratschlab/pthash

Fix two critical issues discovered during metagraph integration: 1. Sparse index encoding bug: - The sparse index encoder uses list_id in control codewords for buckets of size 2-64, but was allocating bits based on offset size (num_bits_per_offset + 1) instead of list_id range. - list_id counts buckets within each size category and can reach the total number of minimizers in worst case (all same size). - Fixed by tracking max_buckets_per_size during statistics collection and using: max(num_bits_per_offset + 1, 2 + 6 + ceil(log2(max_buckets_per_size + 1))) - This provides tight bound with minimal overhead (~2-3% worst case). 2. Verbose flag handling: - build_stats.print() was called unconditionally in dictionary_builder.hpp, ignoring build_config.verbose flag. - Fixed by making print() conditional: if (build_config.verbose) - This respects the library's own configuration mechanism. Implementation: - Added m_max_buckets_per_size member to buckets_statistics class - Modified add_bucket_size() to track maximum incrementally - Zero time overhead (uses std::max with existing data) - Clean encapsulation within statistics class

adamant-pwn · 2026-02-16T19:44:08Z

Sorry, I think this PR was created accidentally, see #87 for the relecant changes.

adamant-pwn and others added 11 commits October 5, 2025 14:38

Support any number of threads

673f3e5

Fix parallel_sort.hpp

9a6ae80

Multithreading fixes

1218e83

Update pthash

4b3781d

update pthash

6457e5d

Try safe-guarding offsets_builder.set with mutex

7f4f9a5

fix kmer_t processing for uint_kmer_bits > 64

20d7d81

update pthash

c35f304

Single-threaded build_sparse_index

06557f6

make compiling executables optional

64d8894

adamant-pwn force-pushed the merge-upstream-master-with-metagraph-compatibility branch 3 times, most recently from 878dc13 to 1e8ba29 Compare February 16, 2026 13:14

adamant-pwn force-pushed the merge-upstream-master-with-metagraph-compatibility branch from 1e8ba29 to 8815360 Compare February 16, 2026 14:10

adamant-pwn closed this Feb 16, 2026

adamant-pwn deleted the merge-upstream-master-with-metagraph-compatibility branch February 16, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream master with metagraph compatibility#86

Merge upstream master with metagraph compatibility#86
adamant-pwn wants to merge 13 commits intojermp:masterfrom
ratschlab:merge-upstream-master-with-metagraph-compatibility

adamant-pwn commented Feb 16, 2026

Uh oh!

adamant-pwn commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adamant-pwn commented Feb 16, 2026

Uh oh!

adamant-pwn commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adamant-pwn commented Feb 16, 2026 •

edited

Loading