Simhash implementation in Javascript
A Javascript implementation of Charikar's hash for identification of similar documents.
Consider two documents A and B that differ in just a single byte.
Hash functions such as SHA-2 or MD5 will hash contents of these two documents into two completely different and unrelated hash values. The Hamming distance between md5(A) and md5(B) would be large. In fact, that is one of the goals of cryptographic hash functions such as SHA-2 or MD5 - to minimize collisions in hash values they generate.
By contrast, Simhash will hash contents of A and B to similar hash values. The Hamming distance between simhash(A) and simhash(B) would be small.
var sjs = require('simhash-js');
var simhash = new sjs.SimHash();
var x = simhash.hash("This is a test of the Emergency Blogcast System");
var y = simhash.hash("This is a second test of the Emergency Blogcast System");
var s = sjs.Comparator.similarity(x, y);
Sincere thanks to: