Skip to content

Container type + fingerprint summary #36

Draft
arminsabouri wants to merge 4 commits intomasterfrom
container
Draft

Container type + fingerprint summary #36
arminsabouri wants to merge 4 commits intomasterfrom
container

Conversation

@arminsabouri
Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri commented Mar 31, 2026

The first commit creates a new value type (something that can be created by an AST node). its a generic vec type. I imagine this may be useful in other analysis as well. The second commit normalizes fingerprints of a tx set. The idea here to get an idea of how sparse or dense the fingerprints are on chain. In the future we will also want to assign this normalized fingerprinting vector to clusters as well.

cc @Mshehu5 @bc1cindy

cherry picked from #16

f.extend(output_types);

// output_structure - sorted deduped discriminants
let structure_types = sorted_deduped(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong that this is a vec. Can outputs be structured in more than one way ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes... like this isn't comparable across txs because the vec has no fixed schema. the position of each field shifts depending on the txs data, e.g. a tx with two output types will push two values before output_structure, while a tx with one output type pushes only one, so ends up at a different index in each vec

seems like one-hot encoding could be useful here, its a machine learning technique. instead of putting the discriminant into the vec, asking binary questions, so each possible characteristic gets its own fixed position with a binary 0/1 value, making them comparable

makes sense?

Copy link
Copy Markdown
Collaborator Author

@arminsabouri arminsabouri Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes to oneshot encoding.
Encoding aside, its just odd to me that we can have outputs be organized in more than 1 way. e.g first sort inputs by bip69 then sort them again by age (?). We would only register two enum variants if oldest coins happen to be in lexicographical order. Talking through it now it seems wrong. For some context, this part of the fingerprinting code was ported from the python library. So I might be missing something

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the python code, get_output_structure intentionally returns multiple variants because DOUBLE/MULTI describe output count and BIP69/CHANGE_LAST describe ordering, different dimensions can coexist

def get_output_structure(tx):
    if len(vout) == 2:
        output_structure.append(OutputStructureType.DOUBLE)
    else:
        output_structure.append(OutputStructureType.MULTI)

    if change_index == len(tx["vout"]) - 1:
        output_structure.append(OutputStructureType.CHANGE_LAST)

    if sorted(amounts) == amounts:
        output_structure.append(OutputStructureType.BIP69)

the problem is the port to a flat fingerprint vec where position = meaning

also noticed CHANGE_LAST exists in python but not in the rust port

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also noticed CHANGE_LAST exists in python but not in the rust port

That was deliberate. The plan was to build support for change identification outside of the fingerprinting crate.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(vout) == 2:
output_structure.append(OutputStructureType.DOUBLE)
else:
output_structure.append(OutputStructureType.MULTI)

Yea. I think we should apply seperation of concerns here. 1. method is_bip69 another 2. output_structure -> [no_change, with_change, batch_payments, consolidation, unknown (coinjoin output decomposition would be in this catagory].

One method is concerned with how the outputs are sorted. The other is with the precieved semantics of the outputs (is it a batch payment vs consolidatinon vs something else)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in 26f2363

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

Copy link
Copy Markdown
Contributor

@bc1cindy bc1cindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very useful

Comment thread src/crates/fingerprints/src/types.rs Outdated
f.extend(output_types);

// output_structure - sorted deduped discriminants
let structure_types = sorted_deduped(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes... like this isn't comparable across txs because the vec has no fixed schema. the position of each field shifts depending on the txs data, e.g. a tx with two output types will push two values before output_structure, while a tx with one output type pushes only one, so ends up at a different index in each vec

seems like one-hot encoding could be useful here, its a machine learning technique. instead of putting the discriminant into the vec, asking binary questions, so each possible characteristic gets its own fixed position with a binary 0/1 value, making them comparable

makes sense?

For storing vecs of generic data. Such as noramlized vector of wallet fingeprints.
AST note for collecting a normalized vec of fingerprinting values.
Currently the loose version is not supported. Have to mess around 
with getting actual txids supported.
Bip69 is concerned with only how outputs are sorted
while output structure attempts to infer meaning. Currently its
not inferring much
@Mshehu5
Copy link
Copy Markdown
Contributor

Mshehu5 commented Apr 18, 2026

I noticed this PR has some conflicts and is still marked as draft would you prefer it to be reviewed now or should we wait until it’s ready?

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

please wait. I will get this shapped up soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants