Detectors

This module provides a collection of different Out-of-Distribution Detectors.

API

Each detector implements a common API which contains a predict and a fit method, where fit is optional. The objects __call__ methods is delegated to the the predict function, so you can use

detector = Detector(model)
detector.fit(data_loader)
scores = detector(x)

Feature-based Interface

Alternatively, you can also use the fit_features and predict_features methods. In that case, inputs will not be passed through the model. This can help to avoid passing data through the model multiple times when fitting several detectors. Detectors who do not support this will raise an exception.

detector = Detector(model=None)
detector.fit_features(train_features, train_labels)
scores = detector.predict_features(test_features)

Some of the detectors support grid-like input, so that they can be used for anomaly segmentation without further adjustment.

class pytorch_ood.api.Detector[source]

Abstract Base Class for an Out-of-Distribution Detector

abstract fit(data_loader: DataLoader) → Self[source]

Fit the detector to a dataset. Some methods require this.

Parameters:: data_loader – dataset to fit on. This is usually the training dataset.
Raises:: ModelNotSetException – if model was not set

abstract fit_features(x: Tensor, y: Tensor) → Self[source]

Fit the detector directly on features. Some methods require this.

Parameters:

x – training features to use for fitting.
y – corresponding class labels.

abstract predict(x: Tensor) → Tensor[source]

Calculates outlier scores. Inputs will be passed through the model.

Parameters:

x – batch of data

Returns:

outlier scores for points

Raises:

RequiresFitException – if detector has to be fitted to some data
ModelNotSetException – if model was not set

abstract predict_features(x: Tensor) → Tensor[source]

Calculates outlier scores based on features.

Parameters:: x – batch of data
Returns:: outlier scores for points
Raises:: RequiresFitException – if detector has to be fitted to some data

Probability-based

Probability-based methods are based on the observation that OOD inputs tend to be assigned lower posteriors with higher entropy, i.e., the predicted distribution is often less concentrated on a single class.

Maximum Softmax (MSP)

class pytorch_ood.detector.MaxSoftmax(model: Module, t: float | None = 1.0)[source]

Implements the Maximum Softmax Probability (MSP) Thresholding baseline for OOD detection.

Optionally, implements temperature scaling, which divides the logits by a constant temperature \(T\) before calculating the softmax. The score is calculated as:

\[- \max_y \sigma_y(f(x) / T)\]

where \(\sigma\) is the softmax function and \(\sigma_y\) indicates the \(y^{th}\) value of the resulting probability vector.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

model – neural network to use
t – temperature value \(T\). Default is 1.

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through model

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by the model

static score(logits: Tensor, t: float | None = 1.0) → Tensor[source]

Parameters:

logits – logits for samples
t – temperature value

Monte Carlo Dropout (MCD)

class pytorch_ood.detector.MCD(model: Module, samples: int = 30, mode: str = 'var', batch_norm: bool = True)[source]

From the paper Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Forward-propagates the input through the model \(N\) times with activated dropout and averages the results.

In mean mode, the outlier score is calculated as

\[- \max_y \frac{1}{N} \sum_n^{N} \sigma_y(f_n(x))\]

where \(\sigma\) is the softmax function. In var mode, the scores are calculated as

\[\frac{1}{C} \sum_y^C \frac{1}{N} \sum_n^N ( \sigma_y(f_n(x)) - \mu_y )^2\]

where \(C\) is the number of classes and \(\mu_y\) is the class mean. This is the mean over the per class variance, which was used in Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.

See MCD Paper:: ICML
See Bayesian SegNet:: ArXiv

Warning

This implementations puts the model into evaluation mode (except for variants of the BatchNorm Layers). This could also affect other modules.

Parameters:

model – the module to use for the forward pass. Should output logits.
samples – number of iterations
mode – can be one of var or mean
batch_norm – keep batch norm layers in evaluation mode

n_samples: number \(N\) of samples

predict(x: Tensor) → Tensor[source]

Parameters:: x – input
Returns:: outlier score

static run(model: Module, x: Tensor, samples: int, batch_norm=True) → Tuple[Tensor, Tensor][source]

Parameters:

model – neural network
x – input
samples – number of rounds
batch_norm – keep batch norm layers in evaluation mode

Returns:

mean and variance of softmax normalized model outputs

static run_mean(model: Module, x: Tensor, samples: int, batch_norm=True) → Tensor[source]

Assumes that the model outputs logits. More memory efficient implementation.

Parameters:

model – neural network
x – input
samples – number of rounds
batch_norm – keep batch norm layers in evaluation mode

Returns:

mean softmax output of the model

Temperature Scaling

class pytorch_ood.detector.TemperatureScaling(model: Module)[source]

Implements temperature scaling from the paper On Calibration of Modern Neural Networks.

The method uses an additional set of validation samples to determine the optimal temperature value \(T\) to calibrate the softmax output.

The score is calculated as:

\[- \max_y \sigma_y(f(x) / T)\]

where \(\sigma\) is the softmax function, \(T\) is the optimal temperature and \(\sigma_y\) indicates the \(y^{th}\) value of the resulting probability vector.

See Paper:: ArXiv
Parameters:: model – neural network to use

fit(loader: DataLoader, device: str = 'cpu') → Self[source]

Extracts features and optimizes the temperature.

Parameters:

loader – data loader
device – device used for extracting logits

fit_features(logits: Tensor, labels: Tensor) → Self[source]

Optimize temperature using L-BFGS. Ignores OOD inputs.

Parameters:

logits – logits
labels – labels for logits

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through model

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by the model

KL-Matching

class pytorch_ood.detector.KLMatching(model: Module)[source]

Implements KL-Matching from the paper Scaling Out-of-Distribution Detection for Real-World Settings.

For each class, an typical posterior distribution \(d_y = \mathbb{E}_{x \sim \mathcal{X}_{val}}[p(y \vert x)]\) is estimated, where \(y\) is the class with the maximum posterior \(y = \arg\max_y p(y \vert x)\), as predicted by the model. Note that the method does not require class labels for the validation set. During evaluation, the KL-Divergence between the observed and the typical posterior \(D_{KL}[p(y \vert x) \Vert d_y]\) is used as outlier score.

This method can also be applied to multi-class settings.

See Paper:: ArXiv
Parameters:: model – neural network, is assumed to output logits.

dists: ParameterDict: Typical posteriors per class

fit(data_loader: DataLoader, device='cpu') → Self[source]

Estimates typical distributions for each class. Ignores OOD samples.

Parameters:

data_loader – validation data loader
device – device which should be used for calculations

fit_features(logits: Tensor, labels: Tensor, device='cpu') → Self[source]

Estimates typical distributions for each class. Ignores OOD samples.

Parameters:

logits – logits
labels – class labels
device – device which should be used for calculations

predict(x: Tensor) → Tensor[source]

Calculates KL-Divergence between predicted posteriors and typical posteriors.

Parameters:: x – input tensor, will be passed through model
Returns:: Outlier scores

predict_features(p: Tensor) → Tensor[source]

Parameters:: p – probabilities predicted by the model

Entropy

class pytorch_ood.detector.Entropy(model: Module)[source]

Implements Entropy-based OOD detection.

This methods calculates the entropy based on the logits of a classifier. Higher entropy means more uniformly distributed posteriors, indicating larger uncertainty. Entropy is calculated as

\[H(x) = - \sum_i^C \sigma_i(f(x)) \log( \sigma_i(f(x)) )\]

where \(\sigma_i\) indicates the \(i^{th}\) softmax value and \(C\) is the number of classes.

Parameters:: model – the model \(f\)

predict(x: Tensor) → Tensor[source]

Calculate entropy for inputs

Parameters:: x – input tensor, will be passed through model
Returns:: Entropy score

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by your model

static score(logits: Tensor) → Tensor[source]

Parameters:: logits – logits of input

Logit-based

Maximum Logit

class pytorch_ood.detector.MaxLogit(model: Module)[source]

Implements the Max Logit Method for OOD Detection as proposed in Scaling Out-of-Distribution Detection for Real-World Settings.

\[- \max_y f_y(x)\]

where \(f_y(x)\) indicates the \(y^{th}\) logits value predicted by \(f\).

See Paper:: ArXiv
Parameters:: t – temperature value T. Default is 1.

predict(x: Tensor) → Tensor[source]

Parameters:: x – model inputs

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits as given by the model

static score(logits: Tensor) → Tensor[source]

Parameters:: logits – logits for samples

OpenMax

class pytorch_ood.detector.OpenMax(model: Module, tailsize: int = 25, alpha: int = 10, euclid_weight: float = 1.0)[source]

Implementation of the OpenMax Layer as proposed in the paper Towards Open Set Deep Networks.

The method determines a center \(\mu_y\) for each class in the logits space of a model, and then creates a statistical model of the distances of correct classified inputs. It uses extreme value theory to detect outliers by fitting a weibull function to the tail of the distance distribution.

We use the pseudo-activation of the unknown class as outlier score.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

model – neural network, assumed to output logits
tailsize – length of the tail to fit the distribution to
alpha – number of class activations to revise
euclid_weight – weight for the Euclidean distance.

fit(data_loader: DataLoader, device: str | None = 'cpu') → Self[source]

Determines parameters of the weibull functions for each class.

Parameters:

data_loader – Data to use for fitting
device – Device used for calculations

fit_features(logits: Tensor, y: Tensor) → Self[source]

Determines parameters of the weibull functions for each class.

Parameters:

logits – logits given by the model
y – class labels

Returns:

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through the model to get logits

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by model

Energy Based (EBO)

class pytorch_ood.detector.EnergyBased(model: Module, t: float | None = 1.0)[source]

Implements the Energy Score of Energy-based Out-of-distribution Detection.

This methods calculates the negative energy for a vector of logits. This value can be used as outlier score.

\[E(x) = -T \log{\sum_i e^{f_i(x)/T}}\]

where \(f_i(x)\) indicates the \(i^{th}\) logit value predicted by \(f\).

See Paper:: NeurIPS
See Implementation:: GitHub
Parameters:: t – Temperature value \(T\). Default is 1.

predict(x: Tensor) → Tensor[source]

Calculate negative energy for inputs

Parameters:: x – input tensor, will be passed through model
Returns:: Energy score

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by the model

static score(logits: Tensor, t: float | None = 1.0) → Tensor[source]

Parameters:

logits – logits of input
t – temperature value

t: float: Temperature

Weighted Energy Based (WEBO)

class pytorch_ood.detector.WeightedEBO(model: Module, weights: Tensor)[source]

Implements the Weighted Energy Based Score of VOS: Learning what you don’t know by virtual outlier synthesis.

This method calculates the energy from the weighted logits. The negative energy can be used as outlier score. The weights (which can be obtained, for example, by training with the pytorch_ood.loss.VOSRegLoss).

Overall, the score is defined as:

\[E(x) = - \log{\sum_i w_{i} e^{f_i(x)}}\]

where \(f_i(x)\) indicates the \(i^{th}\) logit value predicted by \(f\) and \(w\) indicates the weights.

Example Code:

weights = torch.nn.Linear(num_classes, 1))
detector = WeightedEBO(model, weights)
scores = detector(images)

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

model – neural network \(f\) to use, is assumed to output logits
weights – weight vector of with shape \(C \times 1\) where \(C\) is the number of classes

predict(x: Tensor) → Tensor[source]

Calculate weighted energy for inputs

Parameters:: x – input tensor, will be passed through model
Returns:: Weighted Energy score

predict_features(logits: Tensor) → Tensor[source]

Parameters:: logits – logits given by your model

static score(logits: Tensor, weights: Tensor) → Tensor[source]

Parameters:

logits – logits of input
weights – weights as torch.nn.module

Feature-based

Mahalanobis Distance (MD)

class pytorch_ood.detector.Mahalanobis(model: Callable[[Tensor], Tensor], eps: float = 0.002, norm_std: List | None = None)[source]

Implements the Mahalanobis Method from the paper A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks.

This method calculates a class center \(\mu_y\) for each class, and a shared covariance matrix \(\Sigma\) from the data. The outlier scores are then calculated as

\[- \max_k \lbrace (f(x) - \mu_k)^{\top} \Sigma^{-1} (f(x) - \mu_k) \rbrace\]

Also uses ODIN preprocessing.

See Implementation:

GitHub

See Paper:

ArXiv

Parameters:

model – the Neural Network, should output features
eps – magnitude for gradient based input preprocessing
norm_std – Standard deviations for input normalization

cov: Tensor: Covariance Matrix

eps: float: epsilon

fit(data_loader: DataLoader, device: str | None = None) → Self[source]

Fit parameters of the multi variate gaussian.

Parameters:

data_loader – dataset to fit on.
device – device to use

fit_features(z: Tensor, y: Tensor, device: str | None = None) → Self[source]

Fit parameters of the multi variate gaussian.

Parameters:

z – features
y – class labels
device – device to use

mu: Tensor: Centers

property n_classes: Number of classes the model is fitted for

precision: Tensor: Precision Matrix

predict(x: Tensor) → Tensor[source]

Parameters:: x – input tensor

predict_features(z: Tensor) → Tensor[source]

Calculates mahalanobis distance directly on features. ODIN preprocessing will not be applied.

Parameters:: z – features, as given by the model.

Multi-Layer Mahalanobis Distance (MD)

class pytorch_ood.detector.MultiMahalanobis(model: List[Module], alpha: List[float] | None = None)[source]

Implements the Mahalanobis Method from the paper A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks which supports several layers.

For each of the given \(i\) layers, the method calculates a class center \(\mu_{iy}\) for each class, and a shared covariance matrix \(\Sigma_i\) from the data. The per-layer outlier scores are calculated as

\[M_i(x) = - \max_k \lbrace (f_i(x) - \mu_{ik})^{\top} \Sigma_i^{-1} (f_i(x) - \mu_{ik}) \rbrace\]

The final outlier score is the sum of all scores, weighted by \(\alpha\).

Example code is provided here

Note

This does not yet support ODIN preprocessing. Also, the \(\alpha\) values have to be determined manually.

See Implementation:

GitHub

See Paper:

ArXiv

Parameters:

model – the neural network layers \(f_1(\cdot),...,f_n(\cdot)\), output of one will be used as input to the next.
alpha – weighting of the individual layers. Defaults to uniform weighting.

alpha: Per-layer weighting factors

cov: List[Tensor]: Covariance Matrices

fit(data_loader: DataLoader, device: str | None = None) → Self[source]

Fit one gaussian to the features of each layer. Will average over feature maps.

Parameters:

data_loader – dataset to fit on.
device – device to use

Returns:

fit_features(zs: List[Tensor], y: Tensor, device: str | None = None) → Self[source]

Fit parameters of the multi variate gaussians.

Parameters:

zs – list of features for each layer
y – class labels
device – device to use

Returns:

mu: List[Tensor]: Centers

property n_classes: Number of classes the model is fitted for

precision: List[Tensor]: Precision Matrices

predict(x: Tensor) → Tensor[source]

Parameters:: x – input tensor

predict_features(zs: List[Tensor], device=None) → Tensor[source]

Calculates mahalanobis distance directly on features. ODIN preprocessing will not be applied.

Parameters:

zs – list of per-layer features
device – device to use for computations

Relative Mahalanobis Distance (RMD)

class pytorch_ood.detector.RMD(model: Callable[[Tensor], Tensor])[source]

Implements the Relative Mahalanobis Distance (RMD) from the paper A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection.

This method calculates a class center \(\mu_y\) for each class, and a shared covariance matrix \(\Sigma\) from the data.

Additionally, it fits a background gaussian with mean \(\mu_0\) and covariance matrix \(\Sigma_0\) to all of the features and calculates outlier scores as

\[\min_k \lbrace d_k(f(x)) - d_0(f(x)) \rbrace\]

where \(d_k\) is the mahalanobis score for class \(k\) and \(d_0\) is the mahalanobis score under the background gaussian.

See Paper:: ArXiv
Parameters:: model – the Neural Network, should output features

fit(loader: DataLoader, device: str = 'cpu') → Self[source]: Fit parameters of the multi variate gaussian for the given loader. Ignores OOD Inputs.

fit_features(z: Tensor, y: Tensor, device: str | None = None) → Self[source]

Fit parameters of the multi variate gaussian. Ignores OOD inputs.

Parameters:

z – features
y – class labels
device – device to use

Returns:

predict(x: Tensor) → Tensor[source]

Parameters:: x – input tensor

Virtual Logit Matching (ViM)

class pytorch_ood.detector.ViM(model: Callable[[Tensor], Tensor], d: int, w: Tensor, b: Tensor)[source]

Implements Virtual Logit Matching (ViM) from the paper ViM: Out-Of-Distribution with Virtual-logit Matching.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

model – neural network to use, is assumed to output features
d – dimensionality of the principal subspace
w – weights \(W\) of the last layer of the network
b – biases \(b\) of the last layer of the network

alpha: float: the computed \(\alpha\) value

fit(data_loader, device='cpu') → Self[source]

Extracts features and logits, computes principle subspace and alpha. Ignores OOD samples.

Parameters:

data_loader – dataset to fit on
device – device to use

Returns:

fit_features(features: Tensor, labels: Tensor) → Self[source]

Extracts features and logits, computes principle subspace and alpha. Ignores OOD samples.

Parameters:

features – features
labels – class labels

Returns:

predict(x: Tensor) → Tensor[source]

Parameters:: x – model input, will be passed through neural network

predict_features(x: Tensor) → Tensor[source]

Parameters:: x – features as given by the model

Nearest Neighbor (kNN)

class pytorch_ood.detector.KNN(model: Callable[[Tensor], Tensor], **knn_kwargs)[source]

Implements the detector from the paper Out-of-Distribution Detection with Deep Nearest Neighbors.

Fits a nearest neighbor model to the ID samples an uses the distance from the nearest neighbor as outlier score:

\[\min_{z \in \mathcal{D}} \lVert f(x) - f(z) \rVert_2\]

where \(\mathcal{D}\) is the dataset used to train the nearest neighbor model.

The original paper found that using contrastive pre-training could increase the performance.

See PMLR:

arXiv

Parameters:

model – neural network to use
knn_kwargs – dict with keyword arguments that will be passed to the scikit learns k-NN

fit(loader: DataLoader, device: str = 'cpu') → Self[source]

Extracts features and fits the kNN-Model

Parameters:

loader – data loader
device – device used for extracting logits

fit_features(z: Tensor, labels: Tensor) → Self[source]

Fits nearest neighbor model. Ignores OOD inputs.

Parameters:

z – features
labels – labels for features

predict(x: Tensor) → Tensor[source]

Parameters:: x – inputs, will be passed through model

predict_features(z: Tensor) → Tensor[source]

Parameters:

z – features
k – number of neighbors

Simplified Hopfield Energy (SHE)

class pytorch_ood.detector.SHE(backbone: Callable[[Tensor], Tensor], head: Callable[[Tensor], Tensor])[source]

Implements Simplified Hopfield Energy from the paper Out-of-Distribution Detection based on In-Distribution Data Patterns Memorization with modern Hopfield Energy

For each class, SHE estimates the mean feature vector \(S_i\) of correctly classified instances. For some new instances with predicted class \(\hat{y}\), SHE then uses the inner product \(f(x)^{\top} S_{\hat{y}}\) as outlier score.

See Paper:

OpenReview

Parameters:

backbone – feature extractor
head – maps feature vectors to logits

fit(loader: DataLoader, device: str = 'cpu') → Self[source]

Extracts features and calculates mean patterns.

Parameters:

loader – data to fit
device – device to use for computations. If the backbone is a nn.Module, it will be moved to this device.

fit_features(z: Tensor, y: Tensor, device: str = 'cpu', batch_size: int = 1024) → Self[source]

Calculates mean patterns per class.

Parameters:

z – features to fit
y – labels
device – device to use for computations
batch_size – how many samples we process at a time

predict(x: Tensor) → Tensor[source]

Parameters:: x – model inputs

predict_features(z: Tensor) → Tensor[source]

Parameters:: z – features as given by the model

Gram Matrices Based (GM)

class pytorch_ood.detector.Gram(head: Module, feature_layers: List[Module], num_classes: int, num_poles_list: List[int] | None = None)[source]

Implements the on Gram matrices based Method from the paper Detecting Out-of-Distribution Examples with In-distribution Examples and Gram Matrices.

The Gram detector identifies OOD examples by analyzing feature correlations within the layers of a neural network using Gram matrices, which are computed as:

\[G^p_l = \left(F_l^p F_l^{p \top}\right)^{\frac{1}{p}}\]

Where \(F_l\) is the feature-map in layer \(l\). The Gram matrices capture the pairwise correlations between feature maps, which can be seen as capturing the image style. For each layer, matrices for several values of \(p\), called ‘’poles’’ are computed. During training, class-specific minimum and maximum bounds are calculated for each entry in the Gram matrices of the ID data in multiple layers of a neural network. For a test input \(x\), deviations are calculated layer-wise by comparing the Gram matrix values against the stored bounds. The total deviation across all layers \(l\) is normalized using the expected deviation for that layer:

\[\Delta(x) = \sum_{l} \frac{\delta_l(x)}{\mathbb{E}[\delta_l]}\]

See Implementation:

GitHub

See Paper:

ArXiv

Parameters:

head – the head of the model
feature_layers – the layers of the model to be used for feature extraction
num_classes – the number of classes in the dataset
num_poles_list – the list of poles to be used for higher-order Gram matrices

fit(data_loader: DataLoader, device: str | None = None) → Self[source]

Calculate the minimum and maximum values for the Gram matrices of the training data.

Parameters:

data_loader – data loader for training data
device – device to run the model on

Returns:

self

predict(x: Tensor) → Tensor[source]

Calculate deviation for inputs

Parameters:: x – input tensor, will be passed through model
Returns:: Gram based deviations

predict_features(logits: Tensor, feature_list: List[Tensor]) → Tensor[source]

Parameters:

logits – logits given by your model
feature_list – list of features extracted from the model

Returns:

Gram based Deviations

Neural Collapse Inspired (NCI)

class pytorch_ood.detector.NCI(encoder: Module, head: Linear, alpha: float = 0.0)[source]

Implements the Neural-Collapse Inspired OOD detector from the paper Detecting Out-of-distribution through the Lens of Neural Collapse.

Computes a global mean \(\mu_g\) of all features from the fitting set to center representations during inference. Let \(h\) be the representation of some input and \(z = h - \mu_g\) be the centered representation. The score is calculated as

\[- \frac{z \cdot w_c}{\lVert z \rVert_2} - \alpha \lVert h \rVert_1\]

where \(w_c\) is the weight vector for the class that the model predicted for the input, and \(\alpha\) is a hyper parameter that has to be determined manually.

The first term will penalize inputs whose representation does not align with the class vectors, while the second term penalizes inputs whose representation resides close to the origin.

See Paper:

CVPR

See Implementation:

GitHub

Parameters:

encoder – model mapping inputs to features
head – the classification head of the model
alpha – weight for feature norm penalty. Will be ignored if \(\leq 0\)

fit(data_loader) → Self[source]

Parameters:: data_loader – data loader used to compute \(\mu_g\)

fit_features(z: Tensor, *args, **kwargs) → Self[source]

Parameters:: z – input features used to compute \(\mu_g\)

predict(x: Tensor) → Tensor[source]

Calculate outlier score for inputs, which will be passed through the encoder.

Parameters:: x – input tensor, will be passed through model
Returns:: outlier score

predict_features(features: Tensor) → Tensor[source]

Compute outlier scores based on features (without passing through encoder).

Parameters:: features – features given by the model

Gradient-based

Gradient-based detectors are based on the observation that the gradients (w.r.t. the model parameters or the inputs) for ID and OOD data behave differently.

GradNorm

class pytorch_ood.detector.GradNorm(model: Module, param_filter: Callable[[str], bool] | None = None)[source]

Detector from the paper Gradients as a Measure of Uncertainty in Neural Networks.

For each input sample, computes the binary cross-entropy loss between logits and a “confounding label”, which is a vector of all ones. Then, for each set of parameters in the model (as given by model.named_parameters()), computes up the squared \(\ell_2\)-norm of the gradients of the loss w.r.t. that parameter. The outlier score is the sum of these squared norms.

The idea is that higher gradient norms indicates that the model would require large parameter updates to accommodate the input, i.e., for such data, it is less familiar or more uncertain, and hence more likely to be OOD.

Note

OpenOOD uses only the gradients of the final classification head, which makes this computationally cheaper. You can achieve something similar by setting param_filter. Still, this method will compute gradients for all parameters unless you explicitly deactivate gradient calculation for parameters. For an example, see here

See Paper:

ICIP

Parameters:

model – A pre-trained classification model
param_filter – Function which indicates whether a named parameter should be included in the scoring. If none give, all parameters will be used.

predict(x: Tensor) → Tensor[source]

Compute outlier scores from input batch.

We will use the device of the model parameters for computations.

Parameters:: x – input, will be passed through network
Returns:: vector of outlier scores

ODIN Preprocessing

class pytorch_ood.detector.ODIN(model: Module, criterion: Callable[[Tensor], Tensor] | None = None, eps: float = 0.05, temperature: float = 1000.0, norm_std: List[float] | None = None)[source]

Implements ODIN from the paper Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks.

ODIN is a preprocessing method for inputs that aims to increase the discriminability of the softmax outputs for ID and OOD data.

The operation requires two forward and one backward pass.

\[\hat{x} = x - \epsilon \ \text{sign}(\nabla_x \mathcal{L}(f(x) / T, \hat{y}))\]

where \(\hat{y}\) is the predicted class of the network.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

model – module to backpropagate through
criterion – loss function \(\mathcal{L}\) to use. If None is given, we will use negative log likelihood
eps – step size \(\epsilon\) of the gradient descent step
temperature – temperature \(T\) to use for scaling
norm_std – standard deviations used for normalization

criterion: criterion \(\mathcal{L}\)

eps: size \(\epsilon\) of the gradient step in the input space

predict(x: Tensor) → Tensor[source]

Calculates softmax outlier scores on ODIN pre-processed inputs.

Parameters:: x – input tensor
Returns:: outlier scores for each sample

predict_features(x: Tensor) → Tensor[source]

Since ODIN requires backpropagating through the model, this method can not be used.

Raises:: Exception –

temperature: temperature value \(T\)

pytorch_ood.detector.odin_preprocessing(model: Module, x: Tensor, y: Tensor | None = None, criterion: Callable[[Tensor], Tensor] | None = None, eps: float = 0.05, temperature: float = 1000, norm_std: List[float] | None = None)[source]

Functional version of ODIN.

Parameters:

model – module to backpropagate through
x – sample to preprocess
y – the label \(\hat{y}\) which is used to evaluate the loss. If none is given, the models prediction will be used
criterion – loss function \(\mathcal{L}\) to use. If none is given, we will use negative log likelihood
eps – step size \(\epsilon\) of the gradient ascend step
temperature – temperature \(T\) to use for scaling
norm_std – standard deviations used during preprocessing

NAC-UE

Neuron-Activated Coverage from the paper from the paper Neuron Activation Coverage: Rethinking out-of-Distribution detection and generalization

See Paper:

ICLR

Parameters:

model – A classifier that returns logits of shape \((B, C)\), where \(B\) denotes the batch size and \(C\) the number of classes.
layers – Sequence of modules whose outputs \(z\) are used to compute NAC. For a ResNet-style architecture, e.g. [model.layer1, model.layer2, model.layer3, model.layer4].
m_bins – Number of histogram bins \(M\). Either a single value (shared across all layers) or one value per layer.
alpha – Sigmoid steepness parameter \(\alpha\). Either a single value (shared across all layers) or one value per layer.
o_star – Bin-filling parameter \(O^*\) (minimum count required for full coverage). Either a single value (shared across all layers) or one value per layer.
feature_reduce – Function mapping a layer output tensor to a 2D tensor of shape \((B, N)\), where \(B\) denotes the batch size and \(N\) the number of neurons. Defaults to: identity for tensors of shape \((B, N)\), spatial mean for tensors of shape \((B, C, H, W)\), otherwise flatten.
device – Optional device used during fitting and prediction.

fit(data_loader: DataLoader, device=None) → NACUE[source]

Fit the detector to a dataset. Some methods require this.

Parameters:: data_loader – dataset to fit on. This is usually the training dataset.
Raises:: ModelNotSetException – if model was not set

fit_features(x: Tensor, y: Tensor) → NACUE[source]

Fit the detector directly on features. Some methods require this.

Parameters:

x – training features to use for fitting.
y – corresponding class labels.

predict(x: Tensor) → Tensor[source]

Calculates outlier scores. Inputs will be passed through the model.

Parameters:

x – batch of data

Returns:

outlier scores for points

Raises:

RequiresFitException – if detector has to be fitted to some data
ModelNotSetException – if model was not set

predict_features(x: Tensor) → Tensor[source]

Calculates outlier scores based on features.

Parameters:: x – batch of data
Returns:: outlier scores for points
Raises:: RequiresFitException – if detector has to be fitted to some data

Activation Pruning

Activation pruning methods are based on the observation that OOD inputs cause unusual activations in the model, and that, by rectifying these unusual activations, we can often improve discriminability of ID and OOD samples.

Activation Shaping (ASH)

class pytorch_ood.detector.ASH(backbone: Callable[[Tensor], Tensor], head: Callable[[Tensor], Tensor], variant='ash-s', percentile: float = 0.65, detector: Callable[[Tensor], Tensor] | None = None)[source]

Implements ASH from the paper Extremely Simple Activation Shaping for Out-of-Distribution Detection.

ASH prunes the activations in some layer of the network (backbone) by removing a certain percentile of the highest activations. The remaining activations are modified, depending on the particular variant selected, and propagated through the remainder (head) of the network. Then uses the energy based outlier score. This approach has been shown to increase OOD detection rates while maintaining ID accuracy.

ASH-P: only prune, do not modify
ASH-B: binarize remaining activations
ASH-S: rescale remaining activations

The paper applies ASH after the last average pooling layer.

Example Code:

model = WideResNet()
detector = ASH(
    backbone = model.features_before_pool,
    head = model.forward_from_before_pool,
    detector=EnergyBased.score
)
scores = detector(images)

See Paper:

ICLR 2023

See Website:

github.io

Parameters:

variant – one of ash-p, ash-b, ash-s
backbone – first part of model to use, should output feature maps
head – second part of model used after applying ash, should output logits
percentile – amount of activations to modify
detector – detector that maps model outputs to outlier scores. Default is Energy based.

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through network

ReAct

class pytorch_ood.detector.ReAct(backbone: Callable[[Tensor], Tensor], head: Callable[[Tensor], Tensor], threshold: float = 1.0, detector: Callable[[Tensor], Tensor] | None = None)[source]

Implements ReAct from the paper ReAct: Out-of-distribution Detection With Rectified Activations.

ReAct clips the activations in some layer of the network (backbone) and forward propagates the result through the remainder of the model (head). In the paper, ReAct is applied to the penultimate layer of the network.

The output of the network is then passed to an outlier detector that maps the output of the model to outlier scores.

Example Code:

model = WideResNet()
detector = ReAct(
    backbone = model.features,
    head = model.fc,
    detector = EnergyBased.score
)
scores = detector(images)

See Paper:

ArXiv

Parameters:

backbone – first part of model to use, should output feature maps
head – second part of model used after applying ash, should output logits
threshold – cutoff for activations
detector – detector that maps outputs to outlier scores. Default is energy based.

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through network

DICE

class pytorch_ood.detector.DICE(model: Callable[[Tensor], Tensor], w: Tensor, b: Tensor, p: float, detector: Callable[[Tensor], Tensor] | None = None)[source]

Implements DICE from the paper DICE: Leveraging Sparsification for Out-of-Distribution Detection.

See Paper:

ArXiv

Parameters:

model – feature extractor
w – weights of last layer
b – bias of last layer
p – percentile of weights to drop

fit(loader: DataLoader, device: str = 'cpu') → Self[source]

Parameters:

loader – data loader to extract features from. OOD inputs will be ignored.
device – device to use for feature extraction

fit_features(z: Tensor, y: Tensor) → Self[source]

Calculates the masked weights. OOD Inputs will be ignored.

Parameters:

z – features
y – labels.

predict(x: Tensor) → Tensor[source]

Parameters:: x – input, will be passed through network

predict_features(x: Tensor) → Tensor[source]

Parameters:: x – features