Captum Versions Save

Model interpretability and understanding for PyTorch

v0.7.0

4 months ago

The Captum 0.7.0 release adds new functionalities for language model attribution, dataset level attribution, and a few improvements and bug fixes for existing methods.

Language Model Attribution

Captum 0.7.0 adds new APIs for language model attribution, making it substantially easier to define interpretable text features with corresponding baselines and masks. These new wrappers are compatible with most attribution methods in Captum and make it substantially easier to understand how aspects of a prompt impact an LLM’s predicted response. More details can also be found in our paper:

Using Captum to Explain Generative Language Models

Example:

from captum.attr import ShapleyValueSampling, LLMAttribution, TextTemplateInput

shapley_values = ShapleyValueSampling(model)
llm_attr = LLMAttribution(shapley_values, tokenizer)

inp = TextTemplateInput(
    # the text template
    "{} lives in {}, {} and is a {}. {} personal interests include", 
    # the values of the features
    ["Dave", "Palm Coast", "FL", "lawyer", "His"],
    # the reference baseline values of the features
    baselines=["Sarah", "Seattle", "WA", "doctor", "Her"],
)
res = llm_attr.attribute(inp)

DataLoader Attribution

DataLoader Attribution is a new wrapper which provides an easy-to-use approach for obtaining attribution on a full dataset by providing a data loader rather than a single input (PR #1155, #1158).

Attribution Improvements

Captum 0.7.0 has added a few improvements to existing attribution methods including:

Multi-task attribution for Shapley Values and Shapley Value Sampling is now supported, allowing users to get attributions for multiple target outputs simultaneously (PR #1173)
LayerGradCam now supports returning attributions for each channel independently without summing across channels (PR #1086, thanks to @dzenanz for this contribution)

Bug Fixes

Visualization utilities were updated to use the new keyword argument visible to ensure compatibility with Matplotlib 3.7 (PR #1118)
The default visualization mode in visualize_timeseries_attr has been fixed to appropriately utilize overlay_individual (PR #1152, thanks to @teddykoker for this contribution)

v0.6.0

1 year ago

The Captum v0.6.0 release introduces a new feature StochasticGates. This release also enhances Influential Examples and includes a series of other improvements & bug fixes.

Stochastic Gates

Stochastic Gates is a technique to enforce sparsity by approximating L0 regularization. It can be used for network pruning and feature selection. As directly optimizing L0 is a non-differentiable combinatorial problem, Stochastic Gates approximates it by using certain continuous probability distributions (e.g., Concrete, Gaussian) as smoothed Bernoulli distributions. So the optimization can be reparameterized into the distributions parameters. Check the following papers for more details:

Captum provides two Stochastic Gates implementations using different distributions as smoothed Bernoulli, BinaryConcreteStochasticGates and GaussianStochasticGates. They are available under captum.module, a new subpackage collecting neural network building blocks that are useful for model understanding. A usage example:

from captum.module import GaussianStochasticGates

n_gates = 5  # number of gates
stg = GaussianStochasticGates(n_gates, reg_weight=0.01)

inputs = torch.randn(3, n_gates)  # mock inputs with batch size of 3

gated_inputs, reg = stg(mock_inputs)  # gate the inputs
loss = model(gated_inputs)  # use gated inputs in the downstream network

# optimize sparsity regularization together with the model loss
loss += reg 

...

# verify the learned gate values to see how model is using the inputs
print(stg.get_gate_values())

Influential Examples

Influential Examples is a new function pillar enabled in the last version. This new release continues to focus on it and introduces many improvements upon the existing TracInCP family. Some of the changes are incompatible with the previous version. Below is the list of details:

Support loss function with reduction of mean in TracInCPFast and TracInCPFastRandProj (https://github.com/pytorch/captum/pull/913)
TracInCP classes add a new argument show_progress to optionally display progress bars for the compuation (https://github.com/pytorch/captum/pull/898, https://github.com/pytorch/captum/pull/1046)
TracInCP provides a new public method self_influence which computes the self influence scores among the examples in the given data. influence can no longer compute self_influence scores and the argument inputs cannot be None (https://github.com/pytorch/captum/pull/994, https://github.com/pytorch/captum/pull/1069, https://github.com/pytorch/captum/pull/1087, https://github.com/pytorch/captum/pull/1072)
Previous constructor argument influence_src_dataset in TracInCP is renamed to train_dataset (https://github.com/pytorch/captum/pull/994)
Add GPU support to TracInCPFast and TracInCPFastRandProj (https://github.com/pytorch/captum/pull/969)
TracInCP and TracInCPFastRandProj provides a new public method compute_intermediate_quantities which computes “embedding” vectors for examples in a the given data (https://github.com/pytorch/captum/pull/1068)
TracInCP classes supports a new optional argument test_loss_fn for use cases where different losses are used for training and testing examples (https://github.com/pytorch/captum/pull/1073)
Revised the interface of the method influence. Removed the arguments unpack_inputs and target. Now, the inputs argument must be a tuple where the last element is the label (https://github.com/pytorch/captum/pull/1072)

Notable Changes

LRP now will throw error when it detects the model ruses any modules (https://github.com/pytorch/captum/pull/911)
Fixed the bug that the concept order changes in TCAV’s output (https://github.com/pytorch/captum/pull/915, https://github.com/pytorch/captum/issues/909)
Fixed the data type issue of using Captum’s built-in SGD linear models in Lime (https://github.com/pytorch/captum/pull/938, https://github.com/pytorch/captum/issues/910)
All submodules are now accessible under the top-level captum module, so users can import captum and access everything underneath it, e.g., captum.attr (https://github.com/pytorch/captum/pull/912, https://github.com/pytorch/captum/pull/992, https://github.com/pytorch/captum/issues/680)
Added a new attribution visualization utility for time series data (https://github.com/pytorch/captum/pull/980)
Improved version detection to fix some compatibility issues caused by dependencies’ versions (https://github.com/pytorch/captum/pull/940, https://github.com/pytorch/captum/pull/999, )
Fixed an index bug in the tutorial Interpret regression models using Boston House Prices Dataset (https://github.com/pytorch/captum/pull/1014, https://github.com/pytorch/captum/issues/1012)
Refactored FeatureAblation and FeaturePermutation to verify the output type of forward_func and its shape when perturbation_per_eval > 1 (https://github.com/pytorch/captum/pull/1047, https://github.com/pytorch/captum/pull/1049, https://github.com/pytorch/captum/pull/1091)
Changed Housing Regression tutorial with California housing dataset (https://github.com/pytorch/captum/pull/1041)
Improved the error message of invalid input types when the required data type is tensor or tuple[tensor] (https://github.com/pytorch/captum/pull/1083)
Switched to tensor forward_hook from module backward_hook for many attribution algorithms that need tensor gradients, like DeepLift and LayerLRP. So those modules can now support models with in-place modules (https://github.com/pytorch/captum/pull/979, https://github.com/pytorch/captum/issues/914)
Added an optional mask argument to FGSM and PGD adversarial attacks under captum.robust to specify which elements are perturbed (https://github.com/pytorch/captum/pull/1043)

v0.5.0

2 years ago

The Captum v0.5.0 release introduces a new function pillar, Influential Examples, with a few code improvements and bug fixes.

Influential Examples

Influential Examples implements the method TracInCP. It calculates the influence score of a given training example on a given test example, which approximately answers the question “if the given training example were removed from the training data, how much would the loss on the model change?”. TracInCP can be used for:

identifying proponents/opponents, which are the training examples with the most positive/negative influence on a given test example
identifying mis-labelled data

Captum currently offers the following specific variant implementings of TracInCP:

TracInCP - Computes influence scores using gradients at all specified layers. Can be used for identifying proponents/opponents, and identifying mis-labelled data. Both computations take time linear in training data size.
TracInCPFast - Like TracInCP, but computes influence scores using only gradients in the last fully-connected layer, and is expedited using a computational trick.
TracInCPFastRandProj - Version of TracInCPFast which is specialized for computing proponents/opponents. In particular, pre-processing enables computation of proponents / opponents in constant time. The tradeoff is the linear time and memory required for pre-processing. Random projections can be used to reduce memory usage. This class should not be used for identifying mis-labelled data.

A tutorial is made to demonstrate the usage https://captum.ai/tutorials/TracInCP_Tutorial influential example

Notable Changes

Minimum required PyTorch version becomes v1.6.0 (#876)
Enabled argument model_id in TCAV and removed AV from public concept module (PR #811)
Add new configurable argument attribute_to_layer_input in TCAV to set for both layer activation and attribution (#864)
Rename the argument raw_input to raw_input_ids in visualization util VisualizationDataRecord (PR #804)
Support configurable eps argument in DeepLift (PR #835)
Captum now leverages register_full_backward_hook introduced in PyTorch v1.8.0. Attribution to neuron output in NeuronDeepLift, NeuronGuidedBackprop, and NeuronDeconvolution are deprecated and will be removed in the next major release v0.6.0 (PR #837)
Fix the issue that Lime and KernelShap fail to handle empty tensor input like tensor([[],[],[]]) (PR #812)
Fix the bug that visualization_transform of ImageFeature in Captum Insight is not applied (PR #871)

v0.4.1

2 years ago

The Captum v0.4.1 release includes three new tutorials, a few code improvements and bug fixes.

New Tutorials

Robustness tutorial:

Applying robustness attacks and metrics to CIFAR model and dataset

Concept tutorials:

TCAV for image classification for googlenet model
TCAV for NLP sentiment analysis model

Improvements

Reduced unnecessary reliance on Numpy across the codebase by replacing such usages with PyTorch equivalents when possible (PR #714 #755 #760)
Enhanced the error message for missing modules rules in LRP (PR #727)
Switched linter to ufmt from previous black + isort and reformatted the code accordingly (PR #739)
Generalized implementation of captum._utils.av for TCAV to use and refactored TCAV to simplify the creation of datasets used to train concept models (PR #747)

Bug Fixes

Fixed the device error when using TCAV on cuda (Issue #719 #720 #721 , PR #725)
Captum Insight now cache a subset of batches from dataset for recycle to fix the issue of not showing data after iterating all batches (PR #728)
Corrected the loading of reference word embedding in tutorial “Interpreting Bert Part 1” (PR #743)
Renamed the util save_div’s argument default_value to default_denom and unified its behaviors for different denominator types (Issue #654 , PR #751)

v0.4.0

2 years ago

The Captum 0.4.0 release adds new functionalities for concept-based interpretability, evaluating model robustness, new attribution methods including Layerwise Relevance Propagation (LRP), and improvements to existing attribution methods.

Concept-Based Interpretability

Captum 0.4.0 adds TCAV (Testing with Concept Activation Vectors) to Captum, allowing users to identify significance of user-defined concepts on a model’s prediction. TCAV has been implemented in a generic manner, allowing users to define custom concepts with example inputs for any modality including vision and text.

Robustness

Captum 0.4.0 also includes new tools to understand model robustness including implementations of adversarial attacks (Fast Gradient Sign Method and Projected Gradient Descent) as well as robustness metrics to evaluate the impact of different attacks or perturbations on a model. Robustness metrics included in this release include:

Attack Comparator - Allows users to quantify the impact of any input perturbation (such as torchvision transforms, text augmentation, etc.) or adversarial attack on a model and compare the impact of different attacks
Minimal Perturbation - Identifies the minimum perturbation needed to cause a model to misclassify the perturbed input

This robustness tooling enables model developers to better understand potential model vulnerabilities as well as analyze counterfactual examples to better understand a model’s decision boundary. Screen Shot 2021-07-06 at 11 11 54 PM

Layerwise Relevance Propagation (LRP)

We also add a new attribution method LRP (Layerwise Relevance Propagation) to Captum in the 0.4.0 release, as well as a layer attribution variant, Layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance which is decomposed into values for each neuron of the underlying layers. Thanks to @nanohanno for contributing this method to Captum and @rGure for providing feedback!

New Tutorials

We have added new tutorials to demonstrate Captum with BERT, usage of Lime, and DLRM recommender models. These tutorials are:

Interpreting BERT Models (Part 2)
LIME for Image & Text Classification
Interpreting Deep Learning Recommender Model (DLRMs) (https://github.com/facebookresearch/dlrm) with Captum

Additionally, the following fixes and updates to existing tutorials have been added:

The IMDB tutorial has been updated with a new model (trained with a larger embedding and updated dependencies) for reproducibility.
Interpreting BERT Models (Part 1) has been updated to make use of LayerIntegratedGradients with multiple layers to obtain attributions simultaneously, and

Attribution Improvements

Captum 0.4.0 has added improvements to existing attribution methods including:

Neuron conductance now supports a selector function (in addition to providing a neuron index) to select the target neuron for attribution, which enables support for layers with input / output as a tuple of tensors (PR #602).
Lime now supports a generator to be returned by the perturbation function, rather than only a single sample, to better support enumeration of perturbations for interpretable model training (PR #619).
KernelSHAP has been improved to perform weighted sampling of vectors for interpretable model training, rather than uniformly sampling vectors and weighting only when training. This change scales better with larger numbers of features, since weights for larger numbers of features were previously leading to arithmetic underflow (PR #619).
A new option show_progress has been added to all perturbation-based attribution methods, which shows a progress bar to help users track progress of attribution computation (Issue #630 , PR #581).
A new option / flag normalize has been added to infidelity evaluation metric that normalizes and scales infidelity score based on an input flag normalize (Issue: #613, PR: #639 )
All perturbation-based attribution methods now support boolean input tensors (PR #666).
Lime’s default regularization for Lasso regression has been reduced from 1.0 to 0.01 to avoid frequent issues with attribution results being 0 (Issue #679, PR #689).

Bug Fixes

Gradient-based attribution methods have been fixed to not zero previously stored grads, which avoids warnings related to accessing grad of non-leaf tensors (Issue #421, #491, PR #597).
Captum tests were previously included in Captum distributions unnecessarily; tests are no longer packaged with Captum releases (Issue #629 , PR #635).
Captum’s dependency on matplotlib in Conda environments has been changed to matplotlib-base, since pyqt is not used in Captum (Issue #644, PR #648).
Layer attribution methods now set gradient requirements only starting at the target layer rather than at the inputs, which ensures support for models with int or boolean input tensors (PR #647, #643).
Lime and Kernel SHAP int overflow issues (with sklearn interpretable model training) have been resolved, and all interpretable model inputs / outputs are converted to floats prior to training (PR #649).
Original parameter names which were renamed in v0.3 for NoiseTunnel, Kernel Shap, and Lime no longer lead to deprecation warnings and were removed in 0.4.0 (PR #558).

v0.3.1

3 years ago

Captum v0.3.1 includes some improvements and minor fixes beyond the functionalities added in Captum v0.3.0.

Improvements

Captum v0.3.1 has added improvements to existing attribution methods including:

LayerIntegratedGradients now supports computing attributions for multiple layers simultaneously. (PR #532).
NoiseTunnel now supports an internal batch size to split noised inputs into batches and appropriately aggregate results (PR #555).
visualize_text now has an option return_html to export the visualization as HTML code (PR #548).
A utility wrapper was added to allow computing attributions for intermediate layers and inputs simultaneously (PR #534).

Captum Insights

Attributions for multiple models can be compared in Captum Insights (PR #551).
Various improvements to reduce package size of Captum Insights (PR #556 and #562).

Bug Fixes

Some parameter names were renamed in NoiseTunnel, Kernel Shap, and Lime to avoid conflicting names when combining Noise Tunnel or metrics with attribution methods. Deprecated arguments now raise warnings and will be removed in 0.4.0 (PR #558).
Feature Ablation now supports cases where the output may be on a different device than the input, which may occur in model-parallel setups (#528).
Lime (and KernelShap) were fixed to appropriately handle int or long input types (#570).

v0.3.0

3 years ago

The third release, v0.3.0, of Captum adds new attribution algorithms including Lime and KernelSHAP, metrics for assessing attribution results including infidelity and sensitivity, and improvements to existing attribution methods.

Metrics (Sensitivity and Infidelity)

Captum 0.3.0 adds metrics to estimate the trustworthiness of model explanations. Currently available metrics include Sensitivity-Max and Infidelity.

Infidelity measures the mean squared error between model explanations in the magnitudes of input perturbations and predictor function's changes to those input perturbations. Sensitivity measures the degree of explanation changes to subtle input perturbations using Monte Carlo sampling-based approximation. These metrics are available in captum.metrics and documentation can be found here.

Lime and KernelSHAP

In Captum 0.3.0, we also add surrogate-model interpretability methods including Lime and KernelSHAP. Lime is an interpretability method that trains an interpretable surrogate model by sampling points around a specified input example and using model evaluations at these points to train a simpler interpretable 'surrogate' model, such as a linear model.

We offer two implementation variants of this method, LimeBase and Lime. LimeBase provides a generic framework to train a surrogate interpretable model, while Lime provides a more specific implementation than LimeBase in order to expose a consistent API with other perturbation-based algorithms. KernelSHAP is a method that uses the Lime framework to compute Shapley Values.

New Tutorials

We have added new tutorials to demonstrate Captum with CV tasks such as segmentation as well as in distributed environments. These tutorials are:

Using Captum with torch.distributed
Interpreting a semantic segmentation model

Attribution Improvements

Captum 0.3.0 has added improvements to existing attribution methods including:

LayerActivation and LayerGradientXActivation now support computing attributions for multiple layers simultaneously. (PR #456).
Neuron attribution methods now support providing a callable to select or aggregate multiple neurons for attribution, as well as slices to select a range of neurons. (PR #490, #495). The parameter name neuron_index has been deprecated and is replaced by neuron_selector, which supports either indices or a callable.
Feature ablation and feature permutation now allow attribution with respect to multiple batch-aggregate scalars (e.g.loss) simultaneously (PR #425).
Most attribution methods now support a multiply_by_inputs argument. For attribution methods which include a multiplier of inputs or inputs - baselines, this argument selects whether these multipliers should be incorporated or left out to obtain marginal attributions. (PR #432)
Methods accepting internal batch size were updated to generate batches lazily rather than splitting an expanded input tensor, eliminating memory constraints when experimenting with a large number of steps. (PR #333).

Captum Insights

New attribution methods in Captum Insights:
- Feature Ablation (PR #319)
- Occlusion (PR #369)

Bug Fixes

Providing target as a list with inputs on CUDA devices now works appropriately. (Issue #316, PR #317)
DeepLift issues with DataParallel models, particularly when providing additional forward args or multiple targets, have been fixed. (PR #335)
Hooks added within an attribution method were previously not being removed if the attribution method encountered an exception before removing the hook. All hooks are now removed even if an exception is raised during attribution. (PR #340)
LayerDeepLift was fixed to avoid applying hooks on the target layer when attributing layer output, which caused incorrect results or errors with some non-linearities (Issue #382, PR #390, #415).
Non-leaf tensor gradient warning when using NoiseTunnel with Saliency has been fixed. (Issue #421, PR #426)
Text visualization helpers now have option to display legend. (Issue #401, PR #403)
Image visualization helpers fixed to normalize even if outlier threshold is close to 0 (Issue #393, PR #458).

v0.2.0

4 years ago

The second release, v0.2.0, of Captum adds a variety of new attribution algorithms as well as additional tutorials, type hints, and Google Colab support for Captum Insights.

New Attribution Algorithms

The following new attribution algorithms are provided, which can be applied to any type of PyTorch model, including DataParallel models. While the first release focused primarily on gradient-based attribution methods such as Integrated Gradients, the new algorithms include perturbation-based methods, marked by ^ below. We also add new attribution methods designed primarily for convolution networks, denoted by * below. All attribution methods share a consistent API structure to make it easy to switch between attribution methods.

Attribution of model output with respect to the input features

1. Guided Backprop *
2. Deconvolution *
3. Guided GradCAM *
4. Feature Ablation ^
5. Feature Permutation ^
6. Occlusion ^
7. Shapley Value Sampling ^

Attribution of model output with respect to the layers of the model

1. Layer GradCAM
2. Layer Integrated Gradients
3. Layer DeepLIFT
4. Layer DeepLIFT SHAP
5. Layer Gradient SHAP
6. Layer Feature Ablation ^

Attribution of neurons with respect to the input features

1. Neuron DeepLIFT
2. Neuron DeepLIFT SHAP
3. Neuron Gradient SHAP
4. Neuron Guided Backprop *
5. Neuron Deconvolution *
6. Neuron Feature Ablation ^

^ Denotes Perturbation-Based Algorithm. These methods compute attribution by evaluating the model on perturbed versions of the input as opposed to using gradient information. * Denotes attribution method designed primarily for convolutional networks.

New Tutorials

We have added new tutorials to demonstrate Captum on BERT models, regression cases, and using perturbation-based methods. These tutorials include:

Interpreting question answering with BERT
Interpreting regression models using Boston House Prices Dataset
Feature Ablation on Images

Screen Shot 2020-03-04 at 3 35 25 PM

Type Hints

The Captum code base is now fully typed with Python type hints and type checked using mypy. Users can now accurately type-check code using Captum.

Bug Fixes and Minor Features

All Captum methods now support in-place modules and operations. (Issue #156)
Computing convergence delta was fixed to work appropriately on CUDA devices. (Issue #163)
A ReLU flag was added to Layer GradCAM to optionally apply a ReLU operation to the returned attributions. (Issue #179)
All layer and neuron attribution methods now support attribution with respect to either input or output of a module, based on the attribute_to_layer_input and attribute_to_neuron_input flags.
All layer attribution methods now support modules with multiple outputs.

Captum Insights

Captum Insights now works on Google Colab. (Issue #116)
Captum Insights can also be launched as a Jupyter Notebook widget.
New attribution methods in Captum Insights:
- Deconvolution
- Deep Lift
- Guided Backprop
- Input X Gradient
- Saliency

v0.1.0

4 years ago

We just released our first version of the PyTorch Captum library for model interpretability!

Highlights

This first release, v0.1.0, supports a number of gradient-based attribution algorithms as well as Captum Insights, a visualization tool for model debugging and understanding.

Attribution Algorithms

The following general purpose gradient-based attribution algorithms are provided. These can be applied to any type of PyTorch model and input features, including image, text, and multimodal.

Attribution of output of the model with respect to the input features
1. Saliency
2. InputXGradient
3. IntegratedGradient
4. DeepLift
5. DeepLiftShap
6. GradientShap
Attribution of output of the model with respect to the layers of the model
1. LayerActivation
2. LayerGradientXActivation
3. LayerConductance
4. InternalInfluence
Attribution of neurons with respect to the input features
1. NeuronGradient
2. NeuronIntegratedGradients
3. NeuronConductance
Attribution Algorithm + noisy sampling
1. NoiseTunnel NoiseTunnel helps to reduce the noise in the attributions that are assigned by attribution algorithms by using different noise tunnel techniques such as smoothgrad, smoothgrad_sq and vargrad.

Batch and Data Parallel Optimizations

Since some of the algorithms, like integrated gradients, expand input tensors internally, we want to make sure we can scale those tensors and our forward/backward computations efficiently. For that reason, we developed a feature that chunks tensors internally into internal_batch_size pieces, an argument which can be passed as input to attribute methods, which will make the library run forward and backward passes for each tensor batch separately and ultimately combine those after computing gradients.

The algorithms that support batched optimization are:

IntegratedGradients
LayerConductance
InternalInfluence
NeuronConductance

PyTorch data parallel models are also supported across all Captum algorithms, allowing users to take advantage of multiple GPUs when applying interpretability algorithms.

More details on these algorithms can be found on our website at captum.ai/docs/algorithms

Captum Insights

Captum Insights provides these algorithms in an interactive Jupyter notebook-based tool for model debugging and understanding. It can be used embedded within a notebook or run as a standalone application.

Features:

Visualize attribution across sampled data for classification models
Multimodal support with text, image and general features into a single model
Filtering and debugging specific sets of classes and misclassified examples
Jupyter notebook support for easy model and dataset modification

Insights is built with standard web technologies including JavaScript, CSS, React, Yarn and Flask.