python library for automated dataset normalization
AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet
from a single denormalized table and generate features for machine learning using Featuretools.
pip install featuretools[autonormalize]
pip uninstall autonormalize
auto_entityset
auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)
Creates a normalized entityset from a dataframe.
Arguments:
df
(pd.Dataframe) : the dataframe containing data
accuracy
(0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
index
(str, optional) : name of column that is intended index of df
name
(str, optional) : the name of created EntitySet
time_index
(str, optional) : name of time column in the dataframe.
Returns:
entityset
(ft.EntitySet) : created entity setfind_dependencies
find_dependencies(df, accuracy=0.98, index=None)
Finds dependencies within dataframe with the DFD search algorithm.
Returns:
dependencies
(Dependencies) : the dependencies found in the data within the contraints providednormalize_dataframe
normalize_dataframe(df, dependencies)
Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:
Returns:
new_dfs
(list[pd.DataFrame]) : list of new dataframesmake_entityset
make_entityset(df, dependencies, name=None, time_index=None)
Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframe
and a new index will be created if any key has more than a single attribute.
Returns:
entityset
(ft.EntitySet) : created EntitySetnormalize_entityset
normalize_entityset(es, accuracy=0.98)
Returns a new normalized EntitySet
from an EntitySet
with a single entity.
Arguments:
es
(ft.EntitySet) : EntitySet with a single entity to normalizeReturns:
new_es
(ft.EntitySet) : new normalized EntitySet