Test-Driven Data Analysis Functions
The TDDA Python module provides command-line and Python API support for the overall process of data analysis, through the following tools:
Reference Testing: extensions to unittest
and pytest
for
managing testing of data analysis pipelines, where the results are
typically much larger, and more complex, than single numerical
values.
Constraints: tools (and API) for discovery of constraints from data, for validation of constraints on new data, and for anomaly detection.
Finding Regular Expressions (Rexpy): tools (and API) for automatically inferring regular expressions from text data.
Automatic Test Generation (Gentest): TDDA can generate tests for
more-or-less any command that can be run from a command line,
whether it be Python code, R code, a shell script, a shell
command, a Makefile
or a multi-language pipeline involving
compiled code. "Gentest writes tests, so you don't have to."™
The simplest way to install all of the TDDA Python modules is using pip:
pip install tdda
The full set of sources, including all examples, are downloadable from PyPi with:
pip download --no-binary :all: tdda
The sources are also publicly available from Github:
git clone [email protected]:tdda/tdda.git
Documentation is available at http://tdda.readthedocs.io.
If you clone the Github repo, use
python setup.py install
afterwards to install the command-line tools (tdda
and rexpy
).
The tdda.referencetest
library is used to support
the creation of reference tests, based on either unittest or pytest.
These are like other tests except:
For more details from a source distribution or checkout, see the README.md
file and examples in the referencetest
subdirectory.
The tdda.constraints
library is used to 'discover' constraints
from a (Pandas) DataFrame, write them out as JSON, and to verify that
datasets meet the constraints in the constraints file.
For more details from a source distribution or checkout, see the README.md
file and examples in the constraints
subdirectory.
The tdda
repository also includes rexpy
, a tool for automatically
inferring regular expressions from a single field of data examples.
Resources on these topics include:
All examples, tests and code run under Python 2.7, Python 3.5 and Python 3.6.