Optimal Sparse Decision Trees
This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.
It appeared in the 2019 NeurIPS conference
from osdt import OSDT
# initilize an OSDT object
model = OSDT()
# fit the model
model.fit(x_train, y_train)
# make prediction and get the prediction accuracy
prediction, accuracy = model.predict(x_test, y_test)
# make prediction only
prediction = model.predict(x_test)
All code are in the ./src
folder. The OSDT
class is in the osdt.py
file.
CLASS osdt.OSDT(lamb=0.1, prior_metric="curiosity", MAXDEPTH=float('Inf'), MAX_NLEAVES=float('Inf'), niter=float('Inf'),
logon=False, support=True, incre_support=True, accu_support=True, equiv_points=True,
lookahead=True, lenbound=True, R_c0=1, timelimit=float('Inf'), init_cart=True,
saveTree=False, readTree=False)
fit(x, y)
Fit the model with input data.
PARAMETERS:
RETURNS:
predict(x, y=None)
Predict if a particular sample is an outlier or not.
PARAMETERS:
RETURNS:
git clone https://github.com/xiyanghu/OSDT.git
cd OSDT
conda env create -f environment.yml
conda activate osdt
See data/preprocessed/
.
We used 7 datasets: Five of them are from the UCI Machine Learning Repository (tic-tac-toc, car evaluation, monk1, monk2, monk3).
The other two datasets are the ProPublica recidivism data set and the Fair Isaac (FICO) credit risk datasets.
We predict which individuals are arrested within two years of release ({N = 7,215}
) on the recidivism data set and whether an individual will default on a loan for the FICO dataset.
We provide our test code in test_accuracy.py
.
OSDT paper is published in Neural Information Processing Systems (NeurIPS) 2019. If you use OSDT in a scientific publication, we would appreciate citations to the following paper:
@inproceedings{NEURIPS2019_ac52c626,
author = {Hu, Xiyang and Rudin, Cynthia and Seltzer, Margo},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {7265-7273},
publisher = {Curran Associates, Inc.},
title = {Optimal Sparse Decision Trees},
url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/ac52c626afc10d4075708ac4c778ddfc-Paper.pdf},
volume = {32},
year = {2019}
}
or:
Hu, X., Rudin, C., and Seltzer, M. (2019). Optimal sparse decision trees. In Advances in Neural Information Processing Systems, pp. 7265–7273.