Add RoundNumberHeuristics for change identification#48
Add RoundNumberHeuristics for change identification#480xZaddyy wants to merge 4 commits intopayjoin:masterfrom
RoundNumberHeuristics for change identification#48Conversation
|
@arminsabouri one edge case we should account for is payments that look high-precision in BTC but are actually round amounts in fiat. The current implementation only detects round values denominated in BTC, so we may miss these cases. |
Thats fine lets open a ticket and resolve that later |
c513640 to
7a16423
Compare
|
@0xZaddyy Thanks for the last commit. Please let me know when this is ready for review (currently its drafted).
Over all this is moving in the right direction. |
|
@arminsabouri you can have a look |
arminsabouri
left a comment
There was a problem hiding this comment.
Had a couple questions and nits
|
|
||
| // If there's only one output, it's not change | ||
| if outputs.len() == 1 { | ||
| return TxOutChangeAnnotation::NotChange; |
There was a problem hiding this comment.
For future work that we should ticket out. We need a seperate variant (depending on how we define change), ProbablyNotChange. Or define this as a probability. If there is one output it could be a transfer, a consolidation or something else (rune, opreturn more generally). But depending on the other information maybe one conclusion is more likely.
For example, perhaps the one output is reusing an address from the inputs. This is likely change. However if its an opreturn we know it can't be change. Or what if we have seen the output spk in a different cluster belonging to a different wallet. Then we know this is likely not a change again.
RoundNumberHeuristics for change identification
|
Hello @arminsabouri,many of the issues you pointed out were leftovers from my earlier cleanup of the implementation. I’ve now removed them. |
|
@0xZaddyy This repo follows the same commit hygine rules at rust-payjoin. Perhaps its time we create a contributing.md. Can we please squash some of your commits. Mainly the last one and the ones that add tests to previous commits |
|
fe0eff9 introduces a contextual outlier heuristic aimed at reducing overly aggressive change classification(causing false positives). So cases like |
arminsabouri
left a comment
There was a problem hiding this comment.
ConceptAck. This is moving in the right direction. I had a couple questoins about the consts chosen -- so I expect answers on those questions not just code changes please. Lastly I think there is a more concrete statistical model here that may remove the need for consts -- I dont have a rec rn i will look into it
|
|
||
| let weights: Vec<u32> = outputs | ||
| .iter() | ||
| .map(|out| decimal_hamming_weight(out.value().to_sat())) |
There was a problem hiding this comment.
In the future we may want to write a convenience method on AbstractTransaction to get iter<item=HammingWeight>. I imagine other parts of the codebase may need this
There was a problem hiding this comment.
We can introduce this when more heuristics need it.
| TxOutChangeAnnotation::Change | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
Lets also write a test to cover the core logic of is_candidate.
| const LOW_WEIGHT_THRESHOLD: u32 = 2; | ||
| /// Minimum gap between a candidate's weight and the next-highest weight | ||
| /// for the candidate to clearly stand out as change. | ||
| const MIN_OUTLIER_GAP: u32 = 2; |
There was a problem hiding this comment.
can you breifly explain where you got these values?
There was a problem hiding this comment.
So, for LOW_WEIGHT_THRESHOLD, when I think about it, it roughly captures the boundary between the human-chosen amount (payment) and the computed leftover (change).
1.0 btc(100,000,000sats) looks very round
0.5 is roundish
0.014 still roundish
It starts getting irregular from above 2(0.0356)
<= 2 still looks plausibly human-chosen
for MIN_OUTLIER_GAP setting it to 2 also looks fair, setting it to 1 candidates [1, 2, 3] 3 is the highest, but can be misleading and no strong evidence. but with gap = 2 with candidates like [1, 1, 3] 3 clearly stands out while other looks round
There was a problem hiding this comment.
Did you take a look at what other projects set these consts to (blocksci)?
implement Hamming weight roundness score Extract decimal Hamming weight into reusable module Move the non-zero base-10 digit count out of change_identification into its own hamming_weight module with unit tests, and import it back where it's used Add tests for RoundNumberHeuristics change classification Remove obsolete round-amount checks Clean up leftover logic from the previous implementation, including methods like `is_round_amount`, which are no longer needed with the Hamming weight-based heuristic.
Introduce an `Inconclusive` state to represent cases where a heuristic does not have enough evidence to classify an output as change or not.
Require a sole high-weight outlier against a mostly-round background before flagging change. Return `Inconclusive` when the signal is ambiguous.
Introduce clearer `Change` / `NotChange` / `Inconclusive` semantics, add relative comparison for non-change detection, and remove unnecessary allocations in candidate evaluation.
| let outputs: Vec<_> = tx.outputs().collect(); | ||
|
|
||
| if outputs.len() <= 1 { | ||
| return TxOutChangeAnnotation::Inconclusive; | ||
| return TxOutChangeAnnotation::NotChange; | ||
| } | ||
|
|
||
| debug_assert!(vout < outputs.len()); | ||
|
|
||
| let weights: Vec<u32> = outputs | ||
| .iter() | ||
| .map(|out| decimal_hamming_weight(out.value().to_sat())) | ||
| .collect(); |
There was a problem hiding this comment.
| let outputs: Vec<_> = tx.outputs().collect(); | |
| if outputs.len() <= 1 { | |
| return TxOutChangeAnnotation::Inconclusive; | |
| return TxOutChangeAnnotation::NotChange; | |
| } | |
| debug_assert!(vout < outputs.len()); | |
| let weights: Vec<u32> = outputs | |
| .iter() | |
| .map(|out| decimal_hamming_weight(out.value().to_sat())) | |
| .collect(); | |
| let weights: Vec<u32> = tx | |
| .outputs() | |
| .map(|out| decimal_hamming_weight(out.value().to_sat())) | |
| .collect(); | |
| if weights.len() <= 1 { | |
| return TxOutChangeAnnotation::NotChange; | |
| } | |
| debug_assert!(vout < weights.len()); |
This looks like we can drop the outputs allocation. We only use it to compute weights and check the output count so building weights directly from tx.outputs() should be a bit leaner.
|
|
||
| let mut max_other = 0; | ||
| let mut low_weight_count = 0; | ||
| let mut other_count = 0; |
There was a problem hiding this comment.
Can this be re-written as
| let mut other_count = 0; | |
| let mut other_count = weights.len() - 1; |
There was a problem hiding this comment.
Then you can also move the other_count check up:
if other_count == 0 {
return false;
}| const LOW_WEIGHT_THRESHOLD: u32 = 2; | ||
| /// Minimum gap between a candidate's weight and the next-highest weight | ||
| /// for the candidate to clearly stand out as change. | ||
| const MIN_OUTLIER_GAP: u32 = 2; |
There was a problem hiding this comment.
Did you take a look at what other projects set these consts to (blocksci)?
|
|
||
| if Self::is_candidate(&weights, vout) { | ||
| TxOutChangeAnnotation::Change | ||
| } else if target_weight <= Self::LOW_WEIGHT_THRESHOLD |
There was a problem hiding this comment.
Shouldn't this check come first? before the is_candidate check?
This pr adds Round number heuristic for change detection based on the decimal Hamming weight of output values round-like amounts are classified as payments, while higher-weight values are treated as change.