Code for using CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge" https://arxiv.org/abs/1909.13584
Regularizes interpretations (computed via contextual decomposition) to improve neural networks. Official code for Interpretations are useful: penalizing explanations to align neural networks with prior knowledges (ICML 2020 pdf).
Note: this repo is actively maintained. For any questions please file an issue.
ISIC skin-cancer classification - using CDEP, we can learn to avoid spurious patches present in the training set, improving test performance!
The segmentation maps of the patches can be downloaded here
ColorMNIST - penalizing the contributions of individual pixels allows us to teach a network to learn a digit's shape instead of its color, improving its test accuracy from 0.5% to 25.1%
Fixing text gender biases - CDEP can help to learn spurious biases in a dataset, such as gendered words
using CDEP requires two steps:
cd.py
, you may need to write a custom function that iterates through the layers of your network (for examples see cd.py
)@inproceedings{rieger2020interpretations,
title={Interpretations are useful: penalizing explanations to align neural networks with prior knowledge},
author={Rieger, Laura and Singh, Chandan and Murdoch, William and Yu, Bin},
booktitle={International Conference on Machine Learning},
pages={8116--8126},
year={2020},
organization={PMLR}
}