What similarity measure is this hash function for?
Hamming distance
Describe the hash function
This is a single-bit hash function for bit vectors (i.e. BitArrays in Julia). It is extremely straightforward to compute: we choose a random index i into the vectors, and x[i] becomes the hash. The collision probability for a pair of vectors is their Hamming distance divided by the length of the vectors.
References
- Wikipedia
- Reference paper:
Indyk, Piotr.; Motwani, Rajeev. (1998). "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality.". Proceedings of 30th Symposium on Theory of Computing.