Tools for test driven data-wrangling and data validation.
squint
package.pandas
accessors.skip()
, skipIf()
, skipUnless()
(use
unittest.skip()
, etc. instead).Selector
and ProxyGroup
.allowed
interface.Select
, Query
, and Result
API. Use squint
instead:
get_reader()
function. Use get-reader instead:
Fixed bug where ValidationErrors were crashing pytest-xdist workers.
Added tighter Pandas integration using Pandas' extension API.
After calling the new register_accessors() function, your existing
DataFrame
, Series
, Index
, and MultiIndex
objects will have
a validate() method that can be used instead of the validate()
function:
import padas as pd
import datatest as dt
dt.register_accessors() # <- Activate Pandas integration.
df = pd.DataFrame(...)
df[['A', 'B']].validate((str, int)) # <- New accessor method.
Changed Pandas validation behavior:
DataFrame
and Series
: These objects are treated as sequences
when they use a RangeIndex
index (this is the default type
assigned when no index is specified). And they are treated as
dictionaries when they use an index of any other type--the
index values become the dictionary keys.
Index
and MultiIndex
: These objects are treated as sequences.
Changed repr
behavior of Deviation to make timedeltas more readable.
Added Predicate matching support for NumPy types np.character
,
np.integer
, np.floating
, and np.complexfloating
.
Added improved NaN handling:
accepted.keys()
, accepted.args()
, and
validate.interval()
.Added data handling support for squint.Select
objects.
Added deprecation warnings for soon-to-be-removed functions and classes:
Added DeprecationWarning to get_reader
function. This function
is now available from the get-reader package on PyPI:
Added DeprecationWarning to Select
, Query
, and Result
classes.
These classes will be deprecated in the next release but are now
available from the squint package on PyPI:
Changed validate.subset() and validate.superset() behavior:
The semantics are now inverted. This behavior was flipped to more closely match user expectations. The previous semantics were used because they reflect the internal structure of datatest more precisely. But these are implementation details that and they are not as important as having a more intuitive API.
Added temporary a warning when using the new subset superset methods to alert users to the new behavior. This warning will be removed from future versions of datatest.
Added Python 3.9 and 3.10 testing and support.
Removed Python 3.1 testing and support. If you were still using this version of Python, please email me--this is a story I need to hear.
Changed acceptance API to make it both less verbose and more expressive:
Consolidated specific-instance and class-based acceptances into a single interface.
Added a new accepted.tolerance()
method that subsumes the
behavior of accepted.deviation()
by supporting Missing
and
Extra
quantities in addition to Deviation
objects.
Deprecated old methods:
Old Syntax | New Syntax |
---|---|
accepted.specific(...) | accepted(...) |
accepted.missing() | accepted(Missing) |
accepted.extra() | accepted(Extra) |
NO EQUIVALENT | accepted(CustomDifferenceClass) |
accepted.deviation(...) | accepted.tolerance(...) |
accepted.limit(...) | accepted.count(...) |
NO EQUIVALENT | accepted.count(..., scope='group') |
Other methods--accepted.args()
, accepted.keys()
, etc.--remain
unchanged.
Changed validation to generate Deviation
objects for a broader
definition of quantitative values (like datetime
objects)--not
just for subclasses of numbers.Number
.
Changed handling for pandas.Series
objects to treat them as
sequences instead of mappings.
Added handling for DBAPI2 cursor objects to automatically unwrap single-value rows.
Removed acceptance classes from datatest namespace--these were
inadvertently added in a previous version but were never part
of the documented API. They can still be referenced via the
acceptances
module:
from datatest.acceptances import ...
set
members or as dict
keys).__slots__
to difference objects to reduce memory consumption.Select
(Selector now deprecated).allowed
API is now deprecated.approx()
method to require for approximate numeric equality.fuzzy()
method to require strings by approximate match.interval()
method to require elements within a given interval.set()
, subset()
, and superset()
methods for explicit membership
checking.unique()
method to require unique elements.order()
method to require elements by relative order.Predicate
class to formalize behavior--also provides inverse-matching
with the inversion operator (~
).Query
class:
unwrap()
to remove single-element containers and return their
unwrapped contents.starmap()
to unpack grouped arguments when applying a function
to elements.BaseRequirement
). This gives
users a cleaner way to implement custom validation behavior and makes
the underlying codebase easier to maintain.ProxyGroup
to RepeatingContainer
.Improved data handling features and support for Python 3.7:
flatten()
method to serialize dictionary results.to_csv()
method to quickly save results as a CSV file.reduce()
method to accept initializer_factory
as
an optional argument.filter()
method to support predicate matching.True
and False
as predicates to support "truth value testing" on
arbitrary objects (to match on truthy or falsy).ProxyGroup
class for performing the same operations on groups of
objects at the same time (a common need when testing against reference
data).Selector
class keyword filtering to support predicate matching.get_reader()
to support datatest's Selector
and Result
objects.get_reader()
bug that prevented encoding-fallback recovery when
reading from StringIO buffers in Python 2.mandatory
marker to support
incremental testing (stops session early when a mandatory
test fails).--ignore-mandatory
option to continue tests
even when a mandatory test fails.allowed
factory class to simplify allowance imports.