Data preparation for data science projects.
Data preparation accounts for about 80% of the work during a data science project. Let's take that number down. dataPreparation will allow you to do most of the painful data preparation for a data science project with a minimum amount of code.
This package is
data.table
and exponential search)Before using any machine learning (ML) algorithm, one need to prepare its data. Preparing a data set for a data science project can be long and tricky. The main steps are the followings:
data.table::fread
)Here are the functions available in this package to tackle those issues:
Correct | Transform | Filter | Pre model manipulation | Shape |
---|---|---|---|---|
un_factor | generate_date_diffs | fast_filter_variables | fast_handle_na | shape_set |
find_and_transform_dates | generate_factor_from_date | which_are_constant | fast_discretization | same_shape |
find_and_transform_numerics | aggregate_by_key | which_are_in_double | fast_scale | set_as_numeric_matrix |
set_col_as_character | generate_from_factor | which_are_bijection | one_hot_encoder | |
set_col_as_numeric | generate_from_character | remove_sd_outlier | ||
set_col_as_date | fast_round | remove_rare_categorical | ||
set_col_as_factor | target_encode | remove_percentile_outlier |
All of those functions are integrated in the full pipeline function prepare_set
.
For more details on how it work go check our tutorial.
Install the package from CRAN:
install.packages("dataPreparation")
To have the latest features, install the package from github:
library(devtools)
install_github("ELToulemonde/dataPreparation")
Load a toy data set
library(dataPreparation)
data(messy_adult)
head(messy_adult)
Perform full pipeline function
clean_adult <- prepare_set(messy_adult)
head(clean_adult)
That's it. For all functions, you can check out documentation and/or tutorial vignette.
dataPreparation has been developed and used by many active community members. Your help is very valuable to make it better for everyone.
For more details, please refer to CONTRIBUTING.