oaphotodna.py computes PhotoDNA-like hashes (based on the reversed-engineered version available at https://github.com/ArcaneNibble/open-alleged-photodna) for images, compares two images with normalized similarity scoring, and supports a local FAISS-backed nearest-neighbor index for fast lookup of visually similar images.
This version adds:
- a FAISS local vector index
- persistent on-disk metadata in
meta.json - exact L2 nearest-neighbor search
- similarity scores normalized to the same
0..1scale as direct image comparison - query-time filtering by minimum similarity or maximum Euclidean distance
Install dependencies:
pip install pillow numpy faiss-cpuThe script supports four main workflows:
- Compute the hash of a single image.
- Compute hashes for every file in a directory and emit JSON.
- Compare two images using either Euclidean or Manhattan distance.
- Build and query a local FAISS index of previously hashed images.
The PhotoDNA-like hash is represented internally as a flat vector of 144 values. FAISS stores these vectors and searches for nearest neighbors using L2 distance.
Print the top-level help:
python bin/oaphotodna.py --helpThe CLI uses traditional, flag-prefixed arguments (for example --hash, --compare, --faiss-query) rather than positional subcommands.
adulau@blakley:~/git/photodna/bin$ python3 oaphotodna.py
usage: oaphotodna.py [-h] (--hash IMAGE | --hash-dir DIRECTORY | --compare IMAGE1 IMAGE2 | --faiss-build ARG [ARG ...] | --faiss-add ARG [ARG ...] | --faiss-query ARG [ARG ...]) [--metric {euclidean,manhattan}]
[--min-similarity MIN_SIMILARITY] [--max-distance MAX_DISTANCE]
Compute and compare PhotoDNA-like hashes, with optional FAISS local indexing.
options:
-h, --help show this help message and exit
--hash IMAGE Compute the hash of one image
--hash-dir DIRECTORY Compute hashes for every file in a directory and output JSON
--compare IMAGE1 IMAGE2
Compare two images
--faiss-build ARG [ARG ...]
Create a new FAISS index: INDEX META IMAGE [IMAGE ...]
--faiss-add ARG [ARG ...]
Append images to an existing FAISS index: INDEX META IMAGE [IMAGE ...]
--faiss-query ARG [ARG ...]
Find closest indexed matches: INDEX META QUERY_IMAGE [TOP_K]
--metric {euclidean,manhattan}
Distance metric for --compare
--min-similarity MIN_SIMILARITY
With --faiss-query, filter results below this similarity threshold [0,1]
--max-distance MAX_DISTANCE
With --faiss-query, filter results above this Euclidean distancepython bin/oaphotodna.py --hash image.jpgOutput:
73,71,74,32,...
python bin/oaphotodna.py --hash-dir tests/monochromeExample output:
[
{
"filename": "55147310088_42a69416d3_5k.jpg",
"path": "/full/path/to/tests/monochrome/55147310088_42a69416d3_5k.jpg",
"photodna": [73, 71, 74, 32]
}
]Each JSON object includes the base filename, the absolute file path, and the 144-byte PhotoDNA-like vector. Files are processed in sorted filename order, and non-file directory entries are skipped.
Default metric is Euclidean:
python bin/oaphotodna.py --compare image1.jpg image2.jpgUse Manhattan distance instead:
python bin/oaphotodna.py --compare image1.jpg image2.jpg --metric manhattanExample output:
Distance (euclidean): 3.7417
Similarity: 0.998779
The script reports a normalized similarity value between 0 and 1.
1.0means identical hashes- values close to
1.0mean very similar hashes - values closer to
0.0mean more distant hashes
For Euclidean distance, similarity is derived from the maximum possible distance for a 144-dimensional hash with values in the range 0..255:
similarity = 1 - (euclidean_distance / max_possible_distance)
The FAISS query path uses the same normalization so that the similarity reported by --faiss-query is directly comparable to the Similarity: line from --compare.
The local database consists of two files:
index.faiss— the FAISS vector indexmeta.json— sidecar metadata used to map FAISS IDs back to files and hashes
meta.json stores information that FAISS does not store for you in an application-friendly way:
dimension— vector length, normally144metric— stored metric typenext_id— next numeric ID to assignitems— indexed records
Each item in items contains:
id— numeric FAISS IDpath— canonicalized file pathhash— stored 144-element hashextra— optional metadata placeholder
Create a new index from a set of images:
python bin/oaphotodna.py --faiss-build index.faiss meta.json img1.jpg img2.jpg img3.jpgExpected output:
Indexed 3 file(s) into index.faiss
Append more images later:
python bin/oaphotodna.py --faiss-add index.faiss meta.json img4.jpg img5.jpgExpected output:
Added 2 file(s) into index.faiss
Search for the closest matches to a query image:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpgSpecify the number of results to return:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20Example output:
Query: query.jpg
Results: 3
[1] /data/images/img2.jpg
id=17
distance=3.7417
similarity=0.998779
distance_squared=14.0000
[2] /data/images/img7.jpg
id=42
distance=5.2915
similarity=0.998273
distance_squared=28.0000
Only return matches at or above a similarity threshold:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --min-similarity 0.95Only return matches at or below a maximum Euclidean distance:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --max-distance 12python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --min-similarity 0.98 --max-distance 8FAISS returns squared L2 distance internally.
The script converts that into:
distance_squared— raw FAISS valuedistance— Euclidean distance (sqrt(distance_squared))similarity— normalized0..1score derived from Euclidean distance