Skip to content

Add RoundNumberHeuristics for change identification#48

Open
0xZaddyy wants to merge 4 commits intopayjoin:masterfrom
0xZaddyy:roundness
Open

Add RoundNumberHeuristics for change identification#48
0xZaddyy wants to merge 4 commits intopayjoin:masterfrom
0xZaddyy:roundness

Conversation

@0xZaddyy
Copy link
Copy Markdown
Contributor

@0xZaddyy 0xZaddyy commented Apr 6, 2026

This pr adds Round number heuristic for change detection based on the decimal Hamming weight of output values round-like amounts are classified as payments, while higher-weight values are treated as change.

@0xZaddyy
Copy link
Copy Markdown
Contributor Author

0xZaddyy commented Apr 7, 2026

@arminsabouri one edge case we should account for is payments that look high-precision in BTC but are actually round amounts in fiat. The current implementation only detects round values denominated in BTC, so we may miss these cases.

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated
@arminsabouri
Copy link
Copy Markdown
Collaborator

@arminsabouri one edge case we should account for is payments that look high-precision in BTC but are actually round amounts in fiat. The current implementation only detects round values denominated in BTC, so we may miss these cases.

Thats fine lets open a ticket and resolve that later

@0xZaddyy 0xZaddyy force-pushed the roundness branch 3 times, most recently from c513640 to 7a16423 Compare April 7, 2026 16:03
@0xZaddyy 0xZaddyy requested a review from arminsabouri April 7, 2026 16:11
@arminsabouri
Copy link
Copy Markdown
Collaborator

@0xZaddyy Thanks for the last commit. Please let me know when this is ready for review (currently its drafted).
Couple things:

  • Lets try to seperate out the hamming weight calculation into its own file with tests. It will be useful for other things in the future. This should be its own commit
  • And lets write a test or two for the change classification.

Over all this is moving in the right direction.

@0xZaddyy 0xZaddyy marked this pull request as ready for review April 7, 2026 17:33
@0xZaddyy
Copy link
Copy Markdown
Contributor Author

0xZaddyy commented Apr 7, 2026

@arminsabouri you can have a look

Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple questions and nits

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated

// If there's only one output, it's not change
if outputs.len() == 1 {
return TxOutChangeAnnotation::NotChange;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future work that we should ticket out. We need a seperate variant (depending on how we define change), ProbablyNotChange. Or define this as a probability. If there is one output it could be a transfer, a consolidation or something else (rune, opreturn more generally). But depending on the other information maybe one conclusion is more likely.

For example, perhaps the one output is reusing an address from the inputs. This is likely change. However if its an opreturn we know it can't be change. Or what if we have seen the output spk in a different cluster belonging to a different wallet. Then we know this is likely not a change again.

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/lib.rs Outdated
@0xZaddyy 0xZaddyy changed the title Add RoundNumberHeuristics for change identification Add RoundNumberHeuristics for change identification Apr 7, 2026
@0xZaddyy
Copy link
Copy Markdown
Contributor Author

0xZaddyy commented Apr 7, 2026

Hello @arminsabouri,many of the issues you pointed out were leftovers from my earlier cleanup of the implementation. I’ve now removed them.

@0xZaddyy 0xZaddyy requested a review from arminsabouri April 7, 2026 21:24
@arminsabouri
Copy link
Copy Markdown
Collaborator

@0xZaddyy This repo follows the same commit hygine rules at rust-payjoin. Perhaps its time we create a contributing.md. Can we please squash some of your commits. Mainly the last one and the ones that add tests to previous commits

Comment thread src/crates/heuristics/src/change_identification.rs
Comment thread src/crates/heuristics/src/change_identification.rs Outdated
@0xZaddyy
Copy link
Copy Markdown
Contributor Author

fe0eff9 introduces a contextual outlier heuristic aimed at reducing overly aggressive change classification(causing false positives).
The idea behind the current direction is:
An output is considered a candidate only if:
• it is a clear outlier (its weight exceeds all others by a minimum gap), and
• The remaining outputs are mostly low-weight / round-like.
Then force a constraint:
• if exactly one output satisfies the condition → classify as Change
• otherwise → return NotChange or Inconclusive (if there is not enough information or doesn't satisfy)
This idea better captures the idea that roundness is only useful when one output stands out against a round-looking background, rather than just selecting the maximum

So cases like
[1, 1, 6] -> this gives a strong signal
[5, 6, 7] -> this is weak(abstain)
[1, 1, 6, 7] -> ambiguous (abstain)

Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConceptAck. This is moving in the right direction. I had a couple questoins about the consts chosen -- so I expect answers on those questions not just code changes please. Lastly I think there is a more concrete statistical model here that may remove the need for consts -- I dont have a rec rn i will look into it

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
Comment thread src/crates/heuristics/src/change_identification.rs Outdated

let weights: Vec<u32> = outputs
.iter()
.map(|out| decimal_hamming_weight(out.value().to_sat()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we may want to write a convenience method on AbstractTransaction to get iter<item=HammingWeight>. I imagine other parts of the codebase may need this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can introduce this when more heuristics need it.

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
TxOutChangeAnnotation::Change
);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also write a test to cover the core logic of is_candidate.

Comment thread src/crates/heuristics/src/change_identification.rs Outdated
const LOW_WEIGHT_THRESHOLD: u32 = 2;
/// Minimum gap between a candidate's weight and the next-highest weight
/// for the candidate to clearly stand out as change.
const MIN_OUTLIER_GAP: u32 = 2;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you breifly explain where you got these values?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for LOW_WEIGHT_THRESHOLD, when I think about it, it roughly captures the boundary between the human-chosen amount (payment) and the computed leftover (change).
1.0 btc(100,000,000sats) looks very round
0.5 is roundish
0.014 still roundish
It starts getting irregular from above 2(0.0356)
<= 2 still looks plausibly human-chosen
for MIN_OUTLIER_GAP setting it to 2 also looks fair, setting it to 1 candidates [1, 2, 3] 3 is the highest, but can be misleading and no strong evidence. but with gap = 2 with candidates like [1, 1, 3] 3 clearly stands out while other looks round

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take a look at what other projects set these consts to (blocksci)?

implement Hamming weight roundness score

Extract decimal Hamming weight into reusable module

Move the non-zero base-10 digit count out of change_identification
into its own hamming_weight module with unit tests, and import it
back where it's used

Add tests for RoundNumberHeuristics change classification

Remove obsolete round-amount checks

Clean up leftover logic from the previous implementation, including
methods like `is_round_amount`, which are no longer needed with the
Hamming weight-based heuristic.
Introduce an `Inconclusive` state to represent cases where a heuristic
does not have enough evidence to classify an output as change or not.
Require a sole high-weight outlier against a mostly-round background
before flagging change. Return `Inconclusive` when the signal is ambiguous.
Introduce clearer `Change` / `NotChange` / `Inconclusive` semantics,
add relative comparison for non-change detection, and remove
unnecessary allocations in candidate evaluation.
Copy link
Copy Markdown
Contributor

@Mshehu5 Mshehu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK ,
Nice work!
This will be very helpful once paired with suggested changes in #49
requested changes are minor and are just stylistic
P.S Seems this might still have conflicts

Comment on lines 117 to 128
let outputs: Vec<_> = tx.outputs().collect();

if outputs.len() <= 1 {
return TxOutChangeAnnotation::Inconclusive;
return TxOutChangeAnnotation::NotChange;
}

debug_assert!(vout < outputs.len());

let weights: Vec<u32> = outputs
.iter()
.map(|out| decimal_hamming_weight(out.value().to_sat()))
.collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let outputs: Vec<_> = tx.outputs().collect();
if outputs.len() <= 1 {
return TxOutChangeAnnotation::Inconclusive;
return TxOutChangeAnnotation::NotChange;
}
debug_assert!(vout < outputs.len());
let weights: Vec<u32> = outputs
.iter()
.map(|out| decimal_hamming_weight(out.value().to_sat()))
.collect();
let weights: Vec<u32> = tx
.outputs()
.map(|out| decimal_hamming_weight(out.value().to_sat()))
.collect();
if weights.len() <= 1 {
return TxOutChangeAnnotation::NotChange;
}
debug_assert!(vout < weights.len());

This looks like we can drop the outputs allocation. We only use it to compute weights and check the output count so building weights directly from tx.outputs() should be a bit leaner.


let mut max_other = 0;
let mut low_weight_count = 0;
let mut other_count = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be re-written as

Suggested change
let mut other_count = 0;
let mut other_count = weights.len() - 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can also move the other_count check up:

if other_count == 0 {
            return false;
 }

const LOW_WEIGHT_THRESHOLD: u32 = 2;
/// Minimum gap between a candidate's weight and the next-highest weight
/// for the candidate to clearly stand out as change.
const MIN_OUTLIER_GAP: u32 = 2;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take a look at what other projects set these consts to (blocksci)?


if Self::is_candidate(&weights, vout) {
TxOutChangeAnnotation::Change
} else if target_weight <= Self::LOW_WEIGHT_THRESHOLD
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this check come first? before the is_candidate check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants