Similarity algorithm (computes the similarity between two files as a 0 to 1 score) with linear complexity, based on context triggered piecewise (fuzzy) hashes.


6 years ago

1.8.4 release.

Java and libs update.

Added ToStringUtils unescapeCsv and splitCsv methods. ToStringUtils static methods are now public.

Updated Maven and JDK.

Multiple changes in the Java classes.

-Added the methods computeHashToHashesSimilarities (computes the similarities between a hash and a map of hashes and returns them as a map) and computeAllHashesSimilarities (computes the similarities between all the hashes in a map and returns them as a map) to the class

-Added the methods sortSimilarities (sorts a map of similarities between a hash and a map of hashes by a type of similarity) and sortMap (sorts any map by the same order as other sorted map).

-Hash statistics and characteristics removed due to underusage.

-Similarity cache removed from the class, it is not necessary anymore since from now on it can be managed outside the class thanks to the new methods computeHashToHashesSimilarities and computeAllHashesSimilarities.

-SimilarityTypes enum moved to the class from the class Added the methods similarity(hash, similarityType) (computes and returns a type of similarity between two hashes) and similarities (computes all the types of similarities between two hashes and returns them as a map) to the class

-All the methods using a map of names (string) -> objects in the class changed to a generic map of identifiers (any type) -> objects.

-Methods printHashToHashesSimilaritiesTable, saveHashToHashesSimilaritiesAsCsv, printAllHashesSimilaritiesTable and saveAllHashesSimilaritiesAsCsv in the class changed to receive a map of similarities.

-Added the methods collectionToMap (builds a map of identifiers -> objects from a collection of objects, identifying them by index), mapValuesToList (builds a list from the values in a map) and mapKeysToList (builds a list from the keys in a map) to the class

-Removed the methods using collections instead of maps in the class, they are not necessary anymore thanks to the new methods collectionToMap, mapValuesToList, mapKeysToList and sortMap.


6 years ago

1.7.1 release.

Hash rebuild from string representation optimized.

Hash string representation changed to base36 (alphanumeric). New look:


Hash ascii representation removed.

Similarity caching is now optional.

BlocksSet and SimilaritiesCache now are not built until they are used.


6 years ago

1.6.1 release.

Similarity tables to CSV.

Reading and writing hashes and csv files line by line.


6 years ago

1.5.1 release.

MarkAbove and MarkBelow.

Released to sonatype and maven central.

GroupId changed to com.github.s3curitybug. Main package changed to com.github.s3curitybug.similarityuniformfuzzyhash. Pom: distribution management, nexus staging plugin. Pom: source, javadoc and gpg maven plugins. Pom: description, url, inceptionYear, developers, license, scm, issueManagement. License.txt and Notice.txt added to meta-inf.


6 years ago

1.4 release.

Hash Ascii Representation. Factor Must Be Odd.


7 years ago

1.3 release.

Command line help update. Multiple arguments for option --compareToAll.


7 years ago

1.2 release.

Similarity Types.


7 years ago

1.1 release.

New command line interface.


7 years ago

1.0 release.