Hublabel by electricEpilith · Pull Request #1 · electricEpilith/vg

electricEpilith · 2026-04-01T06:49:03Z

Changelog Entry

To be copied to the draft changelog by merger:

Whatsits now frobnicated

Description

Hub labeling pull request test

…++20 upgrade

…he wrong thing

… get a wrong answer

…bbdsg that makes labels that can fit

…d asserts

This reverts commit 5436d73.

…raph objects

…indexing test cases

…ombination

…like for surject

…our generated headers

…e saving the min/zipcodes files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

planned out by Claude Opus 4.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

planned by Claude Opus 4.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix potentially very large loop when looking for rescue graph

…ombination

Revise recombination, chain, and alignment scoring

remove unnecessary return info

Make `vg giraffe` error when the minimizers or zipcodes are older than the distance index

Missing transition checker

…bility issues

sonarqubecloud · 2026-05-02T07:01:56Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
4.5% Duplication on New Code

See analysis details on SonarQube Cloud

codacy-production

Pull Request Overview

This PR implements significant Hub Labeling features and refactors the SnarlDistanceIndex. However, the PR is currently not up to Codacy standards. Key concerns include a critical lack of unit testing for several new core features: the log-scaled gap scoring logic, the 'evaluation bonus' heuristic in chaining, and the index validation logic.

Technically, there are several high-severity issues regarding uninitialized POD members in the clustering logic which could lead to non-deterministic behavior. Additionally, the new make_temporary_distance_index function exhibits extreme cyclomatic complexity (CCN 100), making it virtually untestable in its current form. These issues, combined with the empty PR description and removal of legacy Protobuf-handling functions without explicit deprecation, should be addressed before merging.

About this PR

The PR description is empty. Given the scope of changes—including C++20 migration, Hub Labeling implementation, and scoring refactors—a detailed summary of changes and design decisions is required.
The Hub Labeling implementation introduces a dependency on 'bdsg/ch.hpp'. Verify that the build environment and CI pipelines are updated to include this header.

1 comment outside of the diff

src/cluster.hpp

_{line 606 🔴 HIGH RISK}
Suggestion: Mark this constructor as explicit to prevent implicit conversion from SnarlDistanceIndex to TipAnchoredMaxDistance.
    explicit TipAnchoredMaxDistance(SnarlDistanceIndex& distance_index);

Test suggestions

Verify Hub Labeling query accuracy in oversized snarls against Dijkstra ground truth.
Verify that snarl regularity checking correctly identifies simple, regular, and irregular snarls.
Test alignment rescoring logic with logged gap lengths for matches, mismatches, and various indel sizes.
Verify the 'evaluation bonus' heuristic in chaining preserves haplotype consistency compared to standard DP.
Ensure SnarlDecompositionFuzzer correctly replays and flips nested chain events.
Validate the index predates check in Giraffe correctly identifies stale indices.

Prompt proposal for missing tests

Consider implementing these tests if applicable:
1. Test alignment rescoring logic with logged gap lengths for matches, mismatches, and various indel sizes.
2. Verify the 'evaluation bonus' heuristic in chaining preserves haplotype consistency compared to standard DP.
3. Validate the index predates check in Giraffe correctly identifies stale indices.

Low confidence findings

Several sorting and cleaning functions in graph.cpp/hpp were removed. Ensure these are not part of the public API or that downstream external tools have been migrated to the HandleGraph equivalents.

_{TIP Improve review quality by adding custom instructions}
_{TIP How was this review? Give us feedback}

codacy-production · 2026-05-02T19:26:45Z

+class MEMClusterer::HitEdge {
+public:
+    HitEdge(size_t to_idx, int32_t weight, int64_t distance) : to_idx(to_idx), weight(weight), distance(distance) {}
+    HitEdge() = default;


_{🔴 HIGH RISK}

The default constructor does not initialize the primitive members to_idx, weight, and distance. This can lead to garbage values being used during dynamic programming.

Suggested change

HitEdge() = default;

HitEdge() : to_idx(0), weight(0), distance(0) {}

See Issue in Codacy

codacy-production · 2026-05-02T19:26:45Z

+    finish_gap();
+}
+
+int score_alignment_with_logged_gaps(const size_t& matches, const size_t& mismatches, const std::vector<size_t>& gap_lengths) {


_{🟡 MEDIUM RISK}

The new scoring function 'score_alignment_with_logged_gaps' lacks corresponding unit tests in src/unittest/alignment.cpp to verify the log-scaling behavior against expected minimap2 scores for various indel sizes.

codacy-production · 2026-05-02T19:26:45Z

+using namespace handlegraph;
+namespace vg {
+
+SnarlDistanceIndex::TemporaryDistanceIndex make_temporary_distance_index(


_{🟡 MEDIUM RISK}

This function is extremely complex (CCN 100). The logic for traversing the snarl decomposition and handling BEGIN/END events should be refactored into smaller, specialized private helper methods to improve testability and readability.

See Issue in Codacy

codacy-production · 2026-05-02T19:26:45Z


    const handlegraph::HandleGraph* graph_ptr = (const handlegraph::HandleGraph*) &gbz.graph;

+    double total_zipcode_time = 0.0, total_decoder_time = 0.0;


_{⚪ LOW RISK}

The variables total_zipcode_time and total_decoder_time are assigned a value but never used. Either implement the missing timing logic or remove them to clean up the code.

See Issue in Codacy

codacy-production · 2026-05-02T19:26:45Z

+            REQUIRE(distance_index.minimum_distance(node_id1, rev1, offset1, node_id2, rev2, offset2, false, &graph) == dijkstra_distance);
+        }
+
+        TEST_CASE( "Distance index can query out of a SNP with a reversing allele as an oversided snarl",


_{⚪ LOW RISK}

Nitpick: Typo in test case name: 'oversided' should be 'oversized'.

codacy-production · 2026-05-02T19:26:45Z

+    std::filesystem::file_time_type later_time = *std::max_element(later_times.begin(), later_times.end());
+
+    // Return if the earlier files are touched no later than the later files.
+    return earlier_time <= later_time; 


_{⚪ LOW RISK}

Nitpick: The timestamp comparison logic correctly implements the requirement to ensure downstream indices are newer than their dependencies.

codacy-production · 2026-05-02T19:26:45Z

 pair<int64_t, int64_t> aligned_interval(const Alignment& aln);

+/// Count the various types of edits in an Alignment, including individual gap lengths.
+void count_alignment_operations(const Alignment& aln, size_t& matches, size_t& mismatches, std::vector<size_t>& gaps_lengths);


_{⚪ LOW RISK}

Nitpick: The parameter name gaps_lengths in the header is inconsistent with its usage (gap_lengths) in the implementation file.

codacy-production · 2026-05-02T19:47:26Z

Not up to standards ⛔

🔴 Issues 1 critical · 4 high · 42 medium · 21 minor

Alerts:
⚠ 68 issues (≤ 0 issues of at least minor severity)

Results:
68 new issues

Category Results

UnusedCode 6 medium

ErrorProne 4 high

Security 1 critical

CodeStyle 21 minor

Complexity 36 medium

View in Codacy

🟢 Metrics 2493 complexity · 1262 duplication

Metric Results

Complexity 2493

Duplication 1262

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

_{TIP This summary will be updated as you push new changes.}

electricEpilith and others added 30 commits November 12, 2025 14:32

some progress on hub label integration?

37848a6

hub labeling in (debugging not finished), also changes to deal with C…

297a11f

…++20 upgrade

Point at compatible libbdsg and get build working on Mac

9d4c2e2

Use the new indexing types and accessors to avoid fetching nodes by t…

8c13cf3

…he wrong thing

Use accessors so we can build the Tiny oversized snarl test index and…

788224d

… get a wrong answer

Try dumping hub label data for debugging

4985468

Add synthetic Boost graph dumping code, and missing semicolon, and li…

86e4e31

…bbdsg that makes labels that can fit

Merge remote-tracking branch 'origin/master' into hublabel

ee5bd54

Use libbdsg with slightly more implemented hub labeling integration

e84e657

Make sure NodeProp fields are not used before initialization

4f31496

Stop trying to look up removed trivial snarls

232a589

Add the debugging to subgraph finding that I needed to fix ChainRecor…

30e392a

…d asserts

Stop trying to interpret the root as a chain in debug prints

9639b68

Turn off debugging after passing existing snarl distance index tests

c0db406

Merge remote-tracking branch 'origin/master' into hublabel

ce1027f

Merge remote-tracking branch 'origin/hublabel' into hublabel

77c2ec2

Make randomized graph test actually exercise oversized snarls sometimes

163764f

Add function for loading a handlegraph from JSON

ddce5f4

Allow cactus-ifying all handle graphs

e56353f

Add synthetic fix for actually populating the unique_ptr right

4f66c25

Commit partial synthetic refactor to use new JSON load method

5436d73

Revert "Commit partial synthetic refactor to use new JSON load method"

2c3721d

This reverts commit 5436d73.

Replace string_to_graph with json2graph

695cff5

Remove a bunch of mostly unused functions for working with Protobuf G…

8ede8df

…raph objects

Mostly-automatically convert tests to use vg::io::json2graph

e472799

Remove duplicative JSON to graph function

904f445

Set up tiny test that breaks oversized snarl logic

809a766

Remove unused cases

f2d4f08

Fill in the dustances through oversized snarls to pass more distance …

caaa512

…indexing test cases

Add exhaustive test for small snarls

f15a5f9

faithokamoto and others added 27 commits April 23, 2026 11:26

Merge branch 'master' into missing-trans

1f1cfda

Add an error when minimizers/zipcodes seem older than the distance index

c45bd23

Actually compare times and not iterators

f1cb7e4

Merge remote-tracking branch 'origin/master' into score_seeds_fix_rec…

0d89ce6

…ombination

Declare the minimap2-based scores to be right

78ad8bb

Clear explanations before making them to count

31db629

remove unnecessary return info

3cfc00e

Move minimap2-style scoring logic into functions we could use later, …

d71dc9f

…like for surject

Stop protoc from trying to run multiple times at once and destroying …

0dd4050

…our generated headers

Make sure the distance index is on disk with any pointer fixups befor…

8a79278

…e saving the min/zipcodes files

unittests to detect possible future regressions

07248ad

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

split snarl_distance_index.cpp into four files

128b4b5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor populate_snarl_index to be less complex

c33c076

planned out by Claude Opus 4.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

reduce duplication with new SnarlChildGraph class

43c97c8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

break apart populate_distance_matrix_row

ef0a6c5

planned by Claude Opus 4.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request vgteam#4885 from vgteam/rescue-bug

809463f

Fix potentially very large loop when looking for rescue graph

Merge remote-tracking branch 'origin/master' into score_seeds_fix_rec…

9e40f73

…ombination

Stop worrying about negative recombination penalties

11e053e

Make the evaluation bonus system clearer

f665d80

Adjust comments

76791ab

Rename variables

e6d032a

Merge pull request vgteam#4887 from vgteam/score_seeds_fix_recombination

0655352

Revise recombination, chain, and alignment scoring

Merge pull request vgteam#4889 from vgteam/simplify-return

53ab106

remove unnecessary return info

Merge pull request vgteam#4886 from vgteam/index-age-error

27108c4

Make `vg giraffe` error when the minimizers or zipcodes are older than the distance index

Merge pull request vgteam#4884 from vgteam/missing-trans

cf99eb7

Missing transition checker

merge in changes from master

4bdcafb

remove hash from serialization test because of cross-platform compati…

94c094f

…bility issues

codacy-production Bot reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hublabel#1

Hublabel#1
electricEpilith wants to merge 118 commits intomasterfrom
hublabel

electricEpilith commented Apr 1, 2026

Uh oh!

sonarqubecloud Bot commented May 2, 2026

Uh oh!

codacy-production Bot left a comment

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot May 2, 2026

Uh oh!

codacy-production Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	HitEdge() = default;
	HitEdge() : to_idx(0), weight(0), distance(0) {}


		const handlegraph::HandleGraph* graph_ptr = (const handlegraph::HandleGraph*) &gbz.graph;

		double total_zipcode_time = 0.0, total_decoder_time = 0.0;

Conversation

electricEpilith commented Apr 1, 2026

Changelog Entry

Description

Uh oh!

sonarqubecloud Bot commented May 2, 2026

Quality Gate passed

Uh oh!

codacy-production Bot left a comment

Choose a reason for hiding this comment

Pull Request Overview

About this PR

Test suggestions

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot commented May 2, 2026

Not up to standards ⛔

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants