Aethos Versions Save

Automated Data Science and Machine Learning library to optimize workflow.

v2.0.1

4 years ago

Fixed repr memory issue

v2.0.0

4 years ago

Aethos 2.0 looks to address the intuitiveness and usability of the package and its API to make it easier to use and understand. It also addresses the ability to work with Pandas Dataframes side by side with Aethos.

Reduced import time of the package by simplifying and decoupling of the Aethos modules.
Only 1 object to analyze, visualize, transform, model and analyze results.
Can now specify the type of problem of either Classification, Regression or Unsupervised and only see the models specific to those problems.
Removed the complexity of adding data to the underlying dataframes through Aethos objects. You can access the underlying dataframes with the x_train and x_test properties.
Removed reporting feature.
Introduced new objects to support new cases:
- Analysis: To analyze, visualize and run statistical analyis (t-test, anova, etc.) on your data.
- Classification: To analyze, visualize, run statistical analysis, transform and impute your data to run classification models.
- Regression: To analyze, visualize, run statistical analysis, transform and impute your data to run regression models.
- Unsupervised: To analyze, visualize, run statistical analysis, transform and impute your data to run unsupervised models.
- ClassificationModelAnalysis: Interpret, analyze and visualize classification model results.
- RegressionModelAnalysis: Interpret, analyze and visualize regression model results.
- UnsupervisedModelAnalysis: Interpret, analyze and visualize unsupervised model results.
- TextModelAnalysis: Interpret, analyze and visualize text model results.
Removed dot notation when accessing DataFrame columns.
Can now chain methods together.

v0.7.0

4 years ago

General

Missing data property now displays both missing data tables vertically inline
Package imports have been optimized to make the package import faster
Project metrics are now live through setting an option - set a project metric and all models will be evaluated according to that metric.
Added the itables interactive table as an option for another interactive table with search functionality
Slightly reduced plot sizes

Reporting

Reporting V2 is released
When saving an image, it is also logged to the report
Can now set an option to write report to a word file.

v0.6.4

4 years ago

Release Notes

Added a user configuration file for users to have more customization, default location is $HOME/.pyautoml/config.yml

General

Added property to show all options of colors and color palettes for plots
Reworked the barchart api to make it easier to use and more intuitive
Added JSON Normalization for columns that have nested JSON, aka you can now expand that column into it's own dataframe

Visualizations

Added histograms

v0.6.3

4 years ago

Release Notes

Pandas functions can now be directly applied onto the pyautoml objects if they have not been extended by pyautoml (i.e. describe, drop, etc.). This function will only be applied on the training dataset (or only dataset if only one is provided) and will return the result of the function without altering the state of the object.
You can now set both datasets to variables using the function to_df.
You can now standardize column names
- Column names will all be lower case
- Spaces will be replaced with underscores

Cleaning

Added function to remove columns with 1 unique value and all missing values
Added function to remove columns with all unique values (i.e. id columns)
Changed any remove function calls to drop to be more inline with Pandas

v0.6.2

4 years ago

Release Notes

Introducing interactive filtering and sorting with QGrid. You can now enable the option to use interactive DataFrames when working with your data. See usage on how to enable.

Bug Fixes

Fixed a bug where dropping columns using regex and a keep list was causing the keep list to be disregarded.
.drop now validates that keep is a list.
Added explicit x and y args for almost every plot

General

Added options similar to Pandas. Credit to them for their robust option system.
Made Learning Curve and Score plots larger from Cross Validation
Added columns as a property

Visualizations

Added Correlation Matrix plot
Added Joint Plots
Added Pair Plots

Models

Kmeans now automatically finds the optimal number of clusters using the Elbow Plot with distortion as the metric.
Can now export your model as .pkl file for deploying.

v0.6.1

4 years ago

Release Notes

General

Python 3.5 is no longer supported
Fixed bug where the run parameter for running models was not being taken into account
Added visualizing clusters in 3d
Added properties for y_train and y_test

Models

Added Gridsearch support for unsupervised models
Added 2d and 3d Scatterplot for visualizing clusters

v0.6.0

4 years ago

Release Notes

Added Cleaning, Preprocessing and Feature Engineering techniques.

Added Regression, Classification, Text and Clustering models.

Some models include Agglomerative Hierarchical Clujstering, doc2vec, word2vec, XGBoost Classification and Regression, etc. There are now over 35+ automated and implemented models.

Can now views metrics and compare Classification and Regression models.

Can access model methods from the model name variable. For example: model.log_reg.get_params(), etc.

v0.5.0

4 years ago

Release Notes

General

Can now write to csv
Refactored variables
- data variable no longer exists
- train_data is now x_train to be more compliant with what you see in books/tutorials
- test_data is now x_test to be more compliant with what you see in books/tutorials
Environment is now automatically for reporting

Modelling

Added crossvalidation
- Stratified Kfold
- KFold
- Learning curve
Added Gridsearch
- GridsearchCV
Can now queue multiple models and run them in parallel on a local machine or one after the other if there are limited resources
Can now compare models across all metrics for a given problem (classification vs. regression)

Model Results

Can now understand model results with both SHAP and Microsoft Interpret
Can now view all metrics for classification problem
Added ROC Curve and Confusion Matrix

v0.4.0

4 years ago

Release Notes

Bug Fixes

Fixed data type of PoS tagging and split sentences when stored in Dataframe, they are now correctly lists

General

Added the ability to filter, and do group by analysis and gain descriptive statistical insights into grouped dataframes.
Added scatterplot
Added encoding label functionality
Added ability to search dataframe
Added pandas summary data report
Added word split

Modelling

Introduced automated modelling for text models, classification models and clustering models
Added confusion matrix, and evaluation metrics