Probabilistic Data Structures and Algorithms in Python
Count Sketch and Count–Min Sketch are simple space-efficient probabilistic data structures that are used to estimate frequencies of elements in data streams and can address the Heavy hitters problem.
Count Sketch was proposed by Moses Charikar, Kevin Chen, and Martin Farach-Colton in 2002. Count–Min Sketch was presented in 2003 by Graham Cormode and Shan Muthukrishnan and published in 2005.
setup.py
HyperLogLog algorithm was proposed by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier in 2007. There is a number of modifications of the algorithm, but this implementation uses the original version with a 32-bit hash function.
In this pre-release implemented the following algorithms:
Cardinality problem
Rank problem
Additionally, the overall code has been improved, added tests and examples how to use the implemented data structures.
In this release first 2 classical data structures have been implemented:
In order to support memory-efficient representation we also developed the BitCounter and BitVector.