Container type + fingerprint summary by arminsabouri · Pull Request #36 · payjoin/tx-indexer

arminsabouri · 2026-03-31T19:45:21Z

The first commit creates a new value type (something that can be created by an AST node). its a generic vec type. I imagine this may be useful in other analysis as well. The second commit normalizes fingerprints of a tx set. The idea here to get an idea of how sparse or dense the fingerprints are on chain. In the future we will also want to assign this normalized fingerprinting vector to clusters as well.

cc @Mshehu5 @bc1cindy

cherry picked from #16

arminsabouri · 2026-03-31T19:54:28Z

+            f.extend(output_types);
+
+            // output_structure - sorted deduped discriminants
+            let structure_types = sorted_deduped(


It seems wrong that this is a vec. Can outputs be structured in more than one way ?

I guess yes... like this isn't comparable across txs because the vec has no fixed schema. the position of each field shifts depending on the txs data, e.g. a tx with two output types will push two values before output_structure, while a tx with one output type pushes only one, so ends up at a different index in each vec

seems like one-hot encoding could be useful here, its a machine learning technique. instead of putting the discriminant into the vec, asking binary questions, so each possible characteristic gets its own fixed position with a binary 0/1 value, making them comparable

makes sense?

yes to oneshot encoding.
Encoding aside, its just odd to me that we can have outputs be organized in more than 1 way. e.g first sort inputs by bip69 then sort them again by age (?). We would only register two enum variants if oldest coins happen to be in lexicographical order. Talking through it now it seems wrong. For some context, this part of the fingerprinting code was ported from the python library. So I might be missing something

checking the python code, get_output_structure intentionally returns multiple variants because DOUBLE/MULTI describe output count and BIP69/CHANGE_LAST describe ordering, different dimensions can coexist

def get_output_structure(tx): if len(vout) == 2: output_structure.append(OutputStructureType.DOUBLE) else: output_structure.append(OutputStructureType.MULTI) if change_index == len(tx["vout"]) - 1: output_structure.append(OutputStructureType.CHANGE_LAST) if sorted(amounts) == amounts: output_structure.append(OutputStructureType.BIP69)

the problem is the port to a flat fingerprint vec where position = meaning

also noticed CHANGE_LAST exists in python but not in the rust port

also noticed CHANGE_LAST exists in python but not in the rust port

That was deliberate. The plan was to build support for change identification outside of the fingerprinting crate.

if len(vout) == 2:
output_structure.append(OutputStructureType.DOUBLE)
else:
output_structure.append(OutputStructureType.MULTI)

Yea. I think we should apply seperation of concerns here. 1. method is_bip69 another 2. output_structure -> [no_change, with_change, batch_payments, consolidation, unknown (coinjoin output decomposition would be in this catagory].

One method is concerned with how the outputs are sorted. The other is with the precieved semantics of the outputs (is it a batch payment vs consolidatinon vs something else)

implemented in 26f2363

looks great

bc1cindy

very useful

bc1cindy · 2026-04-01T21:28:54Z

+            f.extend(output_types);
+
+            // output_structure - sorted deduped discriminants
+            let structure_types = sorted_deduped(


I guess yes... like this isn't comparable across txs because the vec has no fixed schema. the position of each field shifts depending on the txs data, e.g. a tx with two output types will push two values before output_structure, while a tx with one output type pushes only one, so ends up at a different index in each vec

seems like one-hot encoding could be useful here, its a machine learning technique. instead of putting the discriminant into the vec, asking binary questions, so each possible characteristic gets its own fixed position with a binary 0/1 value, making them comparable

makes sense?

For storing vecs of generic data. Such as noramlized vector of wallet fingeprints.

AST note for collecting a normalized vec of fingerprinting values. Currently the loose version is not supported. Have to mess around with getting actual txids supported.

Bip69 is concerned with only how outputs are sorted while output structure attempts to infer meaning. Currently its not inferring much

Mshehu5 · 2026-04-18T22:08:25Z

I noticed this PR has some conflicts and is still marked as draft would you prefer it to be reviewed now or should we wait until it’s ready?

arminsabouri · 2026-04-20T13:07:14Z

please wait. I will get this shapped up soon

arminsabouri commented Mar 31, 2026

View reviewed changes

arminsabouri mentioned this pull request Apr 1, 2026

Wallet Fingerprint Summary #16

Closed

bc1cindy reviewed Apr 1, 2026

View reviewed changes

Introduce container type

4721b75

For storing vecs of generic data. Such as noramlized vector of wallet fingeprints.

arminsabouri force-pushed the container branch from 826b0ca to c021452 Compare April 2, 2026 14:39

arminsabouri added 3 commits April 2, 2026 10:41

Fingerprinting summary

d11b292

AST note for collecting a normalized vec of fingerprinting values. Currently the loose version is not supported. Have to mess around with getting actual txids supported.

Rename rbf.rs to fingerprint.rs

00d049c

Split up bip69 output method

26f2363

Bip69 is concerned with only how outputs are sorted while output structure attempts to infer meaning. Currently its not inferring much

arminsabouri force-pushed the container branch from c021452 to 26f2363 Compare April 2, 2026 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container type + fingerprint summary #36

Container type + fingerprint summary #36
arminsabouri wants to merge 4 commits intomasterfrom
container

arminsabouri commented Mar 31, 2026 •

edited

Loading

Uh oh!

arminsabouri Mar 31, 2026

Uh oh!

bc1cindy Apr 1, 2026

Uh oh!

arminsabouri Apr 2, 2026 •

edited

Loading

Uh oh!

bc1cindy Apr 2, 2026

Uh oh!

arminsabouri Apr 2, 2026

Uh oh!

arminsabouri Apr 2, 2026

Uh oh!

arminsabouri Apr 2, 2026

Uh oh!

bc1cindy Apr 2, 2026

Uh oh!

bc1cindy left a comment

Uh oh!

Uh oh!

bc1cindy Apr 1, 2026

Uh oh!

Mshehu5 commented Apr 18, 2026

Uh oh!

arminsabouri commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arminsabouri commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arminsabouri Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bc1cindy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mshehu5 commented Apr 18, 2026

Uh oh!

arminsabouri commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arminsabouri commented Mar 31, 2026 •

edited

Loading

arminsabouri Apr 2, 2026 •

edited

Loading