A simple implementation of simhash algorithm by java.
A simple implementation of simhash algorithm by java.
run Main with inputfile and outputfile.
The format of inputfile(see src/test_in): one doc eachline with the utf8 charset.
The format of outputfile(see src/test_out):
start //start flag
first line // doc
sencode lien // doc1\tdist where dist is the hamming distance between doc and doc1
end //end flag