Aethos Versions Save

Automated Data Science and Machine Learning library to optimize workflow.

v2.0.1

4 years ago

Fixed repr memory issue

v2.0.0

4 years ago

Aethos 2.0 looks to address the intuitiveness and usability of the package and its API to make it easier to use and understand. It also addresses the ability to work with Pandas Dataframes side by side with Aethos.

  • Reduced import time of the package by simplifying and decoupling of the Aethos modules.

  • Only 1 object to analyze, visualize, transform, model and analyze results.

  • Can now specify the type of problem of either Classification, Regression or Unsupervised and only see the models specific to those problems.

  • Removed the complexity of adding data to the underlying dataframes through Aethos objects. You can access the underlying dataframes with the x_train and x_test properties.

  • Removed reporting feature.

  • Introduced new objects to support new cases:

    • Analysis: To analyze, visualize and run statistical analyis (t-test, anova, etc.) on your data.

    • Classification: To analyze, visualize, run statistical analysis, transform and impute your data to run classification models.

    • Regression: To analyze, visualize, run statistical analysis, transform and impute your data to run regression models.

    • Unsupervised: To analyze, visualize, run statistical analysis, transform and impute your data to run unsupervised models.

    • ClassificationModelAnalysis: Interpret, analyze and visualize classification model results.

    • RegressionModelAnalysis: Interpret, analyze and visualize regression model results.

    • UnsupervisedModelAnalysis: Interpret, analyze and visualize unsupervised model results.

    • TextModelAnalysis: Interpret, analyze and visualize text model results.

  • Removed dot notation when accessing DataFrame columns.

  • Can now chain methods together.

v0.7.0

4 years ago

General

  • Missing data property now displays both missing data tables vertically inline
  • Package imports have been optimized to make the package import faster
  • Project metrics are now live through setting an option - set a project metric and all models will be evaluated according to that metric.
  • Added the itables interactive table as an option for another interactive table with search functionality
  • Slightly reduced plot sizes

Reporting

  • Reporting V2 is released
  • When saving an image, it is also logged to the report
  • Can now set an option to write report to a word file.

v0.6.4

4 years ago

Release Notes

  • Added a user configuration file for users to have more customization, default location is $HOME/.pyautoml/config.yml

General

  • Added property to show all options of colors and color palettes for plots
  • Reworked the barchart api to make it easier to use and more intuitive
  • Added JSON Normalization for columns that have nested JSON, aka you can now expand that column into it's own dataframe

Visualizations

  • Added histograms

v0.6.3

4 years ago

Release Notes

  • Pandas functions can now be directly applied onto the pyautoml objects if they have not been extended by pyautoml (i.e. describe, drop, etc.). This function will only be applied on the training dataset (or only dataset if only one is provided) and will return the result of the function without altering the state of the object.
  • You can now set both datasets to variables using the function to_df.
  • You can now standardize column names
    • Column names will all be lower case
    • Spaces will be replaced with underscores

Cleaning

  • Added function to remove columns with 1 unique value and all missing values
  • Added function to remove columns with all unique values (i.e. id columns)
  • Changed any remove function calls to drop to be more inline with Pandas

v0.6.2

4 years ago

Release Notes

Introducing interactive filtering and sorting with QGrid. You can now enable the option to use interactive DataFrames when working with your data. See usage on how to enable.

Bug Fixes

  • Fixed a bug where dropping columns using regex and a keep list was causing the keep list to be disregarded.
  • .drop now validates that keep is a list.
  • Added explicit x and y args for almost every plot

General

  • Added options similar to Pandas. Credit to them for their robust option system.
  • Made Learning Curve and Score plots larger from Cross Validation
  • Added columns as a property

Visualizations

  • Added Correlation Matrix plot
  • Added Joint Plots
  • Added Pair Plots

Models

  • Kmeans now automatically finds the optimal number of clusters using the Elbow Plot with distortion as the metric.
  • Can now export your model as .pkl file for deploying.

v0.6.1

4 years ago

Release Notes

General

  • Python 3.5 is no longer supported
  • Fixed bug where the run parameter for running models was not being taken into account
  • Added visualizing clusters in 3d
  • Added properties for y_train and y_test

Models

  • Added Gridsearch support for unsupervised models
  • Added 2d and 3d Scatterplot for visualizing clusters

v0.6.0

4 years ago

Release Notes

Added Cleaning, Preprocessing and Feature Engineering techniques.

Added Regression, Classification, Text and Clustering models.

Some models include Agglomerative Hierarchical Clujstering, doc2vec, word2vec, XGBoost Classification and Regression, etc. There are now over 35+ automated and implemented models.

Can now views metrics and compare Classification and Regression models.

Can access model methods from the model name variable. For example: model.log_reg.get_params(), etc.

v0.5.0

4 years ago

Release Notes

General

  • Can now write to csv
  • Refactored variables
    • data variable no longer exists
    • train_data is now x_train to be more compliant with what you see in books/tutorials
    • test_data is now x_test to be more compliant with what you see in books/tutorials
  • Environment is now automatically for reporting

Modelling

  • Added crossvalidation

    • Stratified Kfold
    • KFold
    • Learning curve
  • Added Gridsearch

    • GridsearchCV
  • Can now queue multiple models and run them in parallel on a local machine or one after the other if there are limited resources

  • Can now compare models across all metrics for a given problem (classification vs. regression)

Model Results

  • Can now understand model results with both SHAP and Microsoft Interpret
  • Can now view all metrics for classification problem
  • Added ROC Curve and Confusion Matrix

v0.4.0

4 years ago

Release Notes

Bug Fixes

  • Fixed data type of PoS tagging and split sentences when stored in Dataframe, they are now correctly lists

General

  • Added the ability to filter, and do group by analysis and gain descriptive statistical insights into grouped dataframes.
  • Added scatterplot
  • Added encoding label functionality
  • Added ability to search dataframe
  • Added pandas summary data report
  • Added word split

Modelling

  • Introduced automated modelling for text models, classification models and clustering models
  • Added confusion matrix, and evaluation metrics