Yggdrasil Decision Forests Versions Save

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

pydf_v0.4.1

2 weeks ago

Python API - Changelog

Fix

  • Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is installed after PYDF, importing YDF will fails with a has no attribute 'DType' error.
  • Allow for training on cached TensorFlow dataset.

pydf_v0.4.0

3 weeks ago

Python API - 0.4.0 - 2024-04-10

Feature

  • Multi-dimensional features can be selected / configured with the features= training argument.
  • Programmatic access to partial dependence plots and variable importances.
  • Add model.to_tensorflow_function() function to convert a YDF model into a TensorFlow function that can be combined with other TensorFlow operations. This function is compatible with Keras 2 and Keras 3.
  • Add arguments servo_api=False and feed_example_proto=False for model.to_tensorflow_function(mode="tf") to export TensorFlow SavedModel following respectively the Servo API and consuming serialized TensorFlow Example protos.
  • Add pre_processing and post_processing arguments to the model.to_tensorflow_function function to pack pre/post processing operations in a TensorFlow SavedModel.

Tutorials

yggdrasil_decision_forests/port/python/v0.3.0

1 month ago

Python API 0.3.0 - 2024-03-15

Breaking

  • Custom losses now require to provide the gradient, instead of the negative of the gradient.
  • Clarified that YDF may modify numpy arrays returned by a custom loss function.

Features

  • Allow using Jax for custom loss definitions.
  • Allow setting may_trigger_gc on custom losses.
  • Add support for MHLD oblique decision trees.
  • Expose hyperparameter sparse_oblique_max_num_projections.
  • HTML plots for trees with model.plot_tree().
  • Fix protobuf version to 4.24.3 to fix some incompatibilities when using conda.
  • Allow to list compatible engines with model.list_compatible_engines().
  • Allow to choose a fast engine with model.force_engine(...).

Fix

  • Fix slow engine creation for some combination of oblique splits.
  • Improve error message when feeding multi-dimensional labels.

Documentation

  • Clarified documentation of hyperparameters for oblique splits.
  • Fix plots, typos.

Release music

Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy

v1.9.0

1 month ago

1.9.0 - 2024-03-12

Feature

  • Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
  • Add support for custom losses.

v1.9.0rc0

2 months ago

1.9.0rc0 - 2024-02-26

Feature

  • Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
  • Add support for custom losses.

yggdrasil_decision_forests/port/python/v0.1.0

3 months ago

0.1.0 - 2024-01-25

Features

  • Added model validation evaluation (for GBTs) and OOB evaluation (for RFs).
  • Expose winner-takes-all for Random Forests.
  • Added model self evaluation.
  • Added ydf.from_tensorflow_decision_forests() for importing TF-DF models.
  • Allow feeding datasets as sequence of strings.

Fixes

  • Fixes a plotting issue for GBTs without validation loss

Release music

Flötenuhren von 1772 und 1793 - Vivace (Hob XIX:13). Joseph Haydn

v1.8.0

3 months ago

1.8.0 - 2023-11-17

Feature

  • Support for GBT distances.
  • Remove old snapshots automatically for GBT training.

Fix

  • Regression with Mean Squared Error loss and Mean Average error loss incorrectly clamped the gradients, leading to incorrect predictions.
  • Change dependency from boost to boost_math for faster builds.

Note

The commit associated with this release has a typo in its description.

1.7.0 - 2023-10-20

Feature

  • Add support for Mean average error (MAE) loss for GBT.
  • Add pairwise distance between examples.
  • By default, only keep the last three snapshots when training with a working cache to be resilient to training interruptions.

New interface

  • Check out the new Python interface in port/python! It's still experimental but you can already install it from PyPi with pip install ydf.

v1.6.0

7 months ago

Breaking changes

  • The dependency to the distributed gradient boosted trees learner is renamed from //third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees to //third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees:dgbt. Note most case, importing the learners with //third_party/yggdrasil_decision_forests/learner:all_learners is recommended.
  • The training configuration must contain a label. A missing label is no longer interpreted as the label being the input feature "".

Feature

  • Add support for monotonic constraints for gradient boosted trees.
  • Improve speed of dataset reading and writing.

Fix

  • Proper error message when using distributed training on more than 2^31 (i.e., ~2B) examples while compiling YDF with 32-bits example index.
  • Fix Window compilation with Visual Studio 2019
  • Improved error messages for invalid training configuration
  • Replaced outdated dependencies

1.5.0

10 months ago

Feature

  • Rename experimental_analyze_model_and_dataset to analyze_model_and_dataset
  • Add new GBT loss function POISSON for Poisson log likelihood.
  • Go API: Categorical string values available for inspection.
  • Improved training speed for unit-weight datasets.
  • Support for MHLD oblique decision trees.
  • Multi-threaded RMSE computation.
  • Added Uint8 inference engine.
  • Added Multi-task learning where the output of models trained as "secondary" are used as input for the models trained as "primary"

Fix

  • Go API: fixed typo on OutOfVocabulary constant.
  • Error messages for Uplift models.
  • Remove owner leakage in the model compiler.
  • Fix buggy restriction for SelGB sampling
  • Improve documentation.

1.4.0

1 year ago

Features

  • Speed-up the computation of PDP and CEP in the model analysis tool.
  • Add compilation of model into .h file.
  • [JS port] Add "prefix" argument to model loading method.
  • Rename logging function from LOG to YDF_LOG to limit risk of collision with TF or Absl.

Fix

  • [JS port] Fix memory leak. Release emscripten objects.