Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
This major release brings a lot of improvements. Its primary focus is Desbordante’s core: we add several new primitives for pattern discovery.
Changes:
For all introduced primitives, we provide descriptive examples. All primitives are supported in the console version of Desbordante, with the help file containing references to papers in which these primitives are described.
Miscellaneous:
Key enhancements of this minor release concern Python bindings. Namely, we've organized our algorithms into intuitive Python submodules based on primitives and we've provided default algorithms for each one, simplifying usage.
Detailed changes are the following:
desbordante.ucc.UCC
.algorithms
submodule. For example, the UccVerifier algorithm may now be accessed as desbordante.ucc_verification.algorithms.UccVerifier
. The same holds true for simple statistics. The algorithm to extract them may be accessed as desbordante.statistics.algorithms.DataStats
.algorithms
submodule has a default algorithm for ease of use (example: desbordante.fd.algorithms.Default
).load_data(path, separator, has_header, **kwargs)
overloadalgo.load_data(table=(path, separator, has_header), …)
or algo.load_data(table=dataframe, …)
)Key enhancements:
pip install desbordante
.