Training Objectives

Objective (or loss) functions for OOD detection usually aim to improve the discriminability of ID and OOD points in the output or latent space of the model.

All objective functions are implemented as torch.nn.Modules. Some of them have a set of trainable parameters and must be moved to the appropriate device.

Unsupervised

Unsupervised losses only use in-distribution data (or similarly, only on examples from “known known” classes.)

Therefore, all of these loss functions expect that the target labels are strictly \(\geq 0\).

Deep SVDD Loss

class pytorch_ood.loss.DeepSVDDLoss(n_dim: int, reduction: str | None = 'mean', radius: float = 0.0, center: Tensor | None = None)[source]

Deep Support Vector Data Description (SVDD) from the paper Deep One-Class Classification. It places a center \(\mu\) in the output space of the model and pulls ID samples towards the sphere with center \(r\) it in order to learn the common factors of intra class variance.

The loss is defined as follows:

\[\mathcal{L}(x) = \max \lbrace 0, \lVert f(x) - \mu \rVert_2^2 - r^2 \rbrace\]

The distance of a point to the center can be used as outlier score.

This is an implementation of the One-Class Deep SVDD objective, which implies that the radius is not considered trainable and should usually be set to zero.

In the original paper, the center is initialized with the mean of \(f(x)\) over the dataset before training.

See Paper:: ICML

Note

This module should be moved to the correct device before using forward()

Parameters:

n_dim – dimensionality \(n\) of the output space
reduction – reduction method to apply, one of mean, sum or none
radius – radius \(r\)
center – position of the center \(\mu \in \mathbb{R}^n\) where \(n\) is the dimensionality of the output space

property center: ClassCenters: The center \(\mu\)

distance(x: Tensor) → Tensor[source]

Returns:: calculates \(\lVert x - \mu \rVert^2 - r^2\)

forward(x: Tensor, y: Tensor | None = None) → Tensor[source]

Parameters:

x – features
y – target labels (either ID or OOD). If not given, will assume all samples are IN.

Returns:

\(\lVert x - \mu \rVert^2 - r^2\)

radius: radius \(r\) of the hypersphere

static svdd_loss(x: Tensor, center: ClassCenters, radius: Tensor = 0.0, y: Tensor | None = None) → Tensor[source]

Calculates the loss. Treats all ID samples equally, and ignores all OOD samples. If no labels are given, assumes all samples are IN.

Parameters:

x – features
center – center of sphere
radius – radius of sphere
y – Optional labels.

Class Anchor Clustering Loss

class pytorch_ood.loss.CACLoss(n_classes: int, magnitude: float = 1.0, alpha: float = 1.0)[source]

Class Anchor Clustering Loss from the paper Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers.

They place a class conditional center (called anchor) in the output space of the model and pull representations of points of a class \(y\) towards the corresponding center \(\mu_y\) during training. The centers are initialized as unit vectors scaled by a magnitude and not trainable.

They also propose an outlier score based on the distance which is implemented in the CACLoss.score() method.

Example code is provided here

See Paper:: WACV 2022
See Implementation:: GitHub

Centers are initialized as unit vectors, scaled by the magnitude.

Parameters:

n_classes – number of classes \(C\)
magnitude – magnitude of class anchors
alpha – \(\alpha\) weight for anchor term

property centers: ClassCenters: The class centers \(\mu_y\).

distance(x: Tensor) → Tensor[source]

Parameters:: x – input points
Returns:: matrix with squared distances from each point to each center with shape \(B \times C\).

forward(distances: Tensor, target: Tensor) → Tensor[source]

Calculates the CAC loss, based on the given distance matrix and target labels. OOD inputs will be ignored.

Parameters:

distances – matrix of distances of each point to each center with shape \(B \times C\).
target – labels for samples

static score(distance: Tensor) → Tensor[source]

Rejection score proposed in the paper.

Parameters:: distance – distance of instances to class centers
Returns:: outlier scores

II Loss

class pytorch_ood.loss.IILoss(n_classes: int, n_embedding: int, alpha: float = 1.0)[source]

II Loss function from Learning a neural network based representation for open set recognition.

See Paper:: ArXiv
See Implementation:: GitHub

Warning

We added running centers for online class center estimation. This is only an approximation and results might be different if the centers are actually calculated as described in the paper. However, this enables better estimation of the performance during training, without having calculate the centers over the entire dataset. Empirically, we found that these centers work well.

Parameters:

n_classes – number of classes
n_embedding – embedding dimensionality
alpha – weight for both loss terms

property centers: RunningCenters

Returns:: current class center estimates

distance(x: Tensor) → Tensor[source]

Parameters:: x – embeddings
Returns:: distances matrix with distances to class centers in output space

forward(x: Tensor, target: Tensor) → Tensor[source]

Updates running centers

Parameters:

x – embeddings of samples
target – label of samples

predict(x: Tensor) → Tensor[source]

Predict class membership probability

Parameters:: x – embeddings
Returns:: class membership probabilities

Center Loss

class pytorch_ood.loss.CenterLoss(n_classes: int, n_dim: int, magnitude: float = 1.0, radius: float = 0.0, fixed: bool = False)[source]

Generalized version of the Center Loss from the Paper A Discriminative Feature Learning Approach for Deep Face Recognition. For each class, this loss places a center \(\mu_y\) in the output space and draws representations of samples to their corresponding class centers, up to a radius \(r\).

Calculates

\[\mathcal{L}(x,y) = \max \lbrace d(f(x),\mu_y) - r , 0 \rbrace\]

where \(d\) is some measure of dissimilarity, like the squared distance.

With radius \(r=0\) and the squared euclidean distance as \(d(\cdot,\cdot)\), this is equivalent to the original center loss, which is also referred to as the soft-margin loss in some publications.

See Implementation:

GitHub

See Paper:

ECCV 2016

Parameters:

n_classes – number of classes \(C\)
n_dim – dimensionality of center space \(D\)
magnitude – scale \(\lambda\) used for center initialization
radius – radius \(r\) of spheres, lower bound for distance from center that is penalized
fixed – false if centers should be learnable

property centers: ClassCenters

Returns:: the \(\mu\) for all classes

forward(distmat: Tensor, target: Tensor) → Tensor[source]

Calculates the loss. Ignores OOD inputs.

Parameters:

distmat – matrix of distances of each point to each center with shape \(B \times C\).
target – ground truth labels with shape (batch_size).

Returns:

the loss values

Cross-Entropy Loss

class pytorch_ood.loss.CrossEntropyLoss(reduction: str | None = 'mean')[source]

Standard Cross-entropy, but ignores OOD inputs.

Parameters:: reduction – reduction method to apply. Can be one of mean, sum or none

forward(logits: Tensor, targets: Tensor) → Tensor[source]

Calculates cross-entropy.

Parameters:

logits – logits
targets – labels

Confidence Loss

class pytorch_ood.loss.ConfidenceLoss(alpha: float = 1.0, eps: float = 1e-24)[source]

Loss proposed in Learning Confidence for Out-of-Distribution Detection in Neural Networks. The models learns to predict a confidence \(c\) in addition to the class membership.

The loss minimized the Negative Log Likelihood for class membership prediction.

\[ \begin{align}\begin{aligned}\mathcal{L}_{NLL} + \alpha \mathcal{L}_c = - \sum_{i=1}^{M} \log(p'_{i}) y_i - \alpha \log(c)\\\text{where} \quad p_i' = c \cdot p_i + (1-c) y_i\end{aligned}\end{align} \]

See Paper:: ArXiv

Note

We implemented clipping for numerical stability.
This implementation uses mean reduction for batches.
The authors additionally used ODIN preprocessing

Parameters:

alpha – \(\alpha\) used to balance terms
eps – Clipping value \(\epsilon\) used for numerical stability

forward(logits: Tensor, confidence: Tensor, target: Tensor) → Tensor[source]

Parameters:

logits – class logits for samples
confidence – predicted confidence for samples
target – labels for samples (not one-hot encoded)

LogitNorm Loss

class pytorch_ood.loss.LogitNorm(t=1.0, reduction='mean')[source]

LogitNorm from the paper Mitigating Neural Network Overconfidence with Logit Normalization.

Given a model \(f: \mathcal{X} \rightarrow \mathbb{R}^K\) that maps inputs to \(K\) logits, this method normalizes the logits before computing the negative log-likelihood as:

\[\mathcal{L}(x, y) = -\log \Big( \frac{ \exp( \frac{f(x)_y}{ \tau \lVert x \rVert} )}{\sum_{i=1}^K \exp( \frac{ f(x)_i}{ \tau \lVert x \rVert} ) } \Big)\]

where \(\tau\) is a temperature value.

Will ignore OOD inputs.

See Paper:

ICML

Parameters:

t – temperature \(\tau\).
reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) → Tensor[source]

Parameters:

logits – logits as predicted by the model
target – labels

Supervised

Supervised Losses make use from example Out-of-Distribution samples (or samples from known unknown classes). Thus, these losses can handle samples with target values \(< 0\).

Outlier Exposure Loss

class pytorch_ood.loss.OutlierExposureLoss(alpha: float = 0.5, reduction: str | None = 'mean')[source]

Loss from the paper Deep Anomaly Detection With Outlier Exposure. While the formulation in the original paper is very general, this module implements the exact loss that was used in the corresponding experiments.

The loss is defined as

\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \alpha (\sum_{c=1}^C f(x)_c - \log(\sum_{c=1}^C e^{f(x)_c})) \quad \text{ otherwise } }\]

where \(C\) is the number of classes, \(\alpha\) is a hyper parameter, and \(\sigma_y\) denotes the \(y^{th}\) softmax output.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

alpha – weighting coefficient \(\alpha\)
reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) → Tensor[source]

Parameters:

logits – class logits for predictions
target – labels for predictions

Returns:

loss

Entropic Open-Set Loss

class pytorch_ood.loss.EntropicOpenSetLoss(reduction: str | None = 'mean')[source]

From the paper Reducing Network Agnostophobia. The loss aims to maximizes the entropy for OOD inputs.

A variant for segmentation was proposed in Entropy Maximization and Meta Classification for Out-Of-Distribution Detection in Semantic Segmentation.

The loss is calculated as

\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \text{if } y \geq 0 \atop \frac{1}{C} \sum_{c=1}^C \log \sigma_c(f(x)) \quad \text{ otherwise } }\]

where \(\sigma\) is the softmax function and \(C\) is the number of classes.

See Paper:: NeurIPS
See Paper:: ArXiv
Parameters:: reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) → Tensor[source]

Parameters:

logits – class logits
target – target labels

Returns:

the loss

Objectosphere Loss

class pytorch_ood.loss.ObjectosphereLoss(alpha: float = 1.0, xi: float = 1.0, reduction: str | None = 'mean')[source]

From the paper Reducing Network Agnostophobia.

\[\mathcal{L}(x, y) = \mathcal{L}_E(x,y) + \alpha \Biggl \lbrace { \max \lbrace 0, \xi - \lVert F(x) \rVert \rbrace^2 \quad \text{if } y \geq 0 \atop \lVert F(x) \rVert_2^2 \hspace{3.7cm} \text{ otherwise } }\]

where \(F(x)\) are deep features in some layer of the model, and \(\mathcal{L}_E\) is the Entropic Open-Set Loss.

See Paper:

NeurIPS

Parameters:

alpha – weight coefficient
xi – minimum feature magnitude \(\xi\)

forward(logits: Tensor, features: Tensor, target: Tensor) → Tensor[source]

Parameters:

logits – class logits \(f(x)\)
features – deep features \(F(x)\)
target – target labels \(y\)

Returns:

the loss

static score(logits: Tensor) → Tensor[source]

Outlier score used by the objectosphere loss.

Parameters:: logits – instance logits
Returns:: outlier scores

Energy-Bounded Learning Loss

class pytorch_ood.loss.EnergyRegularizedLoss(alpha: float = 1.0, margin_in: float = -1.0, margin_out: float = -1.0, reduction: str = 'mean')[source]

Augments the cross-entropy by a regularization term that aims to increase the energy gap between ID and OOD samples. This term is defined as

\[\mathcal{L}(x, y) = \alpha \Biggl \lbrace { \max(0, E(x) - m_{in})^2 \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \max(0, m_{out} - E(x))^2 \quad \quad \quad \quad \quad \text{ otherwise } }\]

where \(E(x) = - \log(\sum_i e^{f_i(x)} )\) is the energy of \(x\).

See Paper:

NeurIPS

See Implementation:

GitHub

Parameters:

alpha – weighting parameter
margin_in – margin energy \(m_{in}\) for ID data
margin_out – margin energy \(m_{out}\) for OOD data
reduction – can be one of none, mean, sum

forward(logits: Tensor, targets: Tensor) → Tensor[source]

Calculates weighted sum of cross-entropy and the energy regularization term.

Parameters:

logits – logits
targets – labels

VOS Energy-Based Loss

class pytorch_ood.loss.VOSRegLoss(logistic_regression: Linear, weights_energy: Linear, alpha: float = 0.1, device: str = 'cpu', reduction: str = 'mean')[source]

Implements the loss function from VOS: Learning what you don’t know by virtual outlier synthesis without the synthesising of virtual outliers. The loss adds a regularization term to the cross-entropy that aims to increase the (weighted) energy gap between ID and OOD samples.

The regularization term is defined as:

\[\mathcal{L} = \mathbb{E}_{v \sim V} \left[ -\text {log}\frac{1}{1+\text{exp}^{-\phi(E(v))}} \right] + \mathbb{E}_{x \sim D} \left[ -\text {log} \frac{\text{exp}^{-\phi(E(x))}}{1+ \text{exp}^{-\phi(E(x))}}\right]\]

where \(\phi\) is a possibly non-linear function, \(E\) is the weighted energy and \(V\) and \(D\) are the distributions of the (possibly virtual) outliers and the ID data respectively.

See Paper:: ArXiv
See Implementation:: GitHub

For initialisation of \(\phi\) and the weights for weighted energy:

phi = torch.nn.Linear(1, 2)
weights = torch.nn.Linear(num_classes, 1)
torch.nn.init.uniform_(weights.weight)
criterion = VOSRegLoss(phi, weights)

Note

This implementation does not generate synthetic outliers. For this feature, see pytorch_ood.loss.vos.VirtualOutlierSynthesizingRegLoss.

Parameters:

logistic_regression – \(\phi\) function. Can be for example a linear layer.
weights_energy – neural network layer with weights for the energy
alpha – weighting parameter \(\alpha\).
reduction – reduction method to apply, one of mean, sum or none
device – For example cpu or cuda:0

forward(logits: Tensor, y: Tensor) → Tensor[source]

Parameters:

logits – logits
y – labels

Virtual Outlier Synthesizing Loss

class pytorch_ood.loss.VirtualOutlierSynthesizingRegLoss(logistic_regression: Linear, weights_energy: Linear, device: str, num_classes: int, num_input_last_layer: int, fc: Linear, alpha: float = 0.1, reduction: str = 'mean', sample_number: int = 1000, select: int = 1, sample_from: int = 10000)[source]

Implements the loss function of VOS: Learning what you don’t know by virtual outlier synthesis with additional sampling of virtual outliers. These outliers are synthesized by fitting a gaussian to the latent features and sampling from low-likelihood regions. This alleviates the need for real outliers during training.

For more information see VOS Energy-Based Loss.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:

logistic_regression – \(\phi\) function. Can be for example a linear layer.
weights_energy – neural network layer, with weights for the energy
device – For example cpu or cuda:0
num_classes – number of classes
num_input_last_layer – number of inputs in the last layer of the network
fc – fully connected last layer of the network
alpha – weighting parameter
reduction – reduction method to apply, one of mean, sum or none
sample_number – number of samples that are used for virtual outlier synthesis
select – number of highest density samples that are used for virtual outlier synthesis
sample_from – number of samples that are used for sampling the probability distribution

forward(logits: Tensor, features: Tensor, y: Tensor)[source]

Parameters:

logits – logits
features – features
y – labels

MCHAD Loss

class pytorch_ood.loss.MCHADLoss(n_classes: int, n_dim: int, radius: float = 0, margin: float = 0, weight_center: float = 1.0, weight_nll: float = 1.0, weight_oe: float = 1.0)[source]

Implements the MCHAD loss from the Paper Multi-Class Hypersphere Anomaly-Detection.

The Loss places a center \(\mu_y\) for each class \(y\) in the output space of the model and has three components:

\[ \begin{align}\begin{aligned}\mathcal{L}_{\Lambda}(x,y) = \max \lbrace 0, \Vert \mu_y - f(x)_y \Vert^2_2 - r^2 \rbrace\\\mathcal{L}_{\Delta}(x,y) = \log(1 + \sum_{i \neq y} e^{\Vert \mu_y - f(x)_y \Vert^2_2 - \Vert \mu_y - f(x)_i \Vert^2_2} )\\\mathcal{L}_{\Theta}(x) = \sum_i \max \lbrace 0, (r + m)^2 - \Vert f(x) - \mu_y \Vert^2 \rbrace\end{aligned}\end{align} \]

Intuitively, the first term forces the samples to cluster tightly in a sphere of radius \(r\) around the corresponding class centers. The second term ensures that the (learnable) class centers remain separable and do not collapse. The third term makes sure that OOD samples have at least a distance \(m\) to the surface of each hypersphere.

The loss can be used in a supervised, as well as in an unsupervised manner.

See Implementation:

GitLab

See Paper:

ICPR

Parameters:

n_classes – number of classes \(C\)
n_dim – dimensionality of the output space \(D\)
radius – radius of the hyperspheres
margin – margin around hyperspheres
weight_center – weight \(\lambda_{\Lambda}\) for the center loss term
weight_nll – weight \(\lambda_{\Delta}\) for the maximum likelihood term
weight_oe – weight \(\lambda_{\Theta}\) for the outlier exposure term

property centers: ClassCenters: Class centers \(\mu_y\)

distance(z: Tensor) → Tensor[source]

Calculates the distance of each embedding to each center.

Parameters:: z – embeddings of shape \(B \times D\).
Returns:: distance matrix of shape \(B \times C\).

forward(distmat: Tensor, y: Tensor) → Tensor[source]

Parameters:

distmat – distance matrix shape \(B \times C\).
y – labels

Returns:

loss values

Background Class Loss

class pytorch_ood.loss.BackgroundClassLoss(n_classes: int, reduction: str = 'mean')[source]

The idea of the background-class is that OOD samples are mapped to an individual class during training. This implementation uses the normal cross-entropy, but handles remapping of the background class labels to positive target labels. Thus, when the target labels are \(\lbrace 0, 1, 2, ..., N - 1 \rbrace\) we will remap all entries with target label \(<0\) to \(N\).

The networks output layer has to include \(N+1\) outputs, so logits are in the shape \(B \times (N + 1)\).

Parameters:

n_classes – number of classes \(N\) (not counting background class)
reduction – can be one of none, mean, sum

forward(logits: Tensor, targets: Tensor) → Tensor[source]

Parameters:

logits – class logits
targets – target labels

Returns:

Cross-Entropy for remapped samples

Energy Margin Loss (Scone)

class pytorch_ood.loss.EnergyMarginLoss(full_train_loss: floating, eta=1.0, false_alarm_cutoff=0.05, in_constraint_weight=1.0, ce_tol=2.0, ce_constraint_weight=1.0, out_constraint_weight=1.0, lr_lam=1.0, penalty_mult=1.5, constraint_tol=0.0)[source]

Loss from the paper Feed Two Birds with One Scone. Introducing a margin to further improve performance Energy-based OOD detection method, specifically for handling covariate shifted data.

See Paper:: ArXiv
See Implementation:: GitHub
See Derivation:: ArXiv

Constructor of EnergyMarginLoss

Parameters:

full_train_loss – average classification loss of pre-trained model
eta – margin between ID and OOD; Covariate-shifted data should reside in-between
false_alarm_cutoff – false alarm cutoff
in_constraint_weight – penalty parameter for in-distribution constraint
lam – lagrangian multiplier for in-distribution constraint
lam2 – lagrangian multiplier for multi-class model constraint
ce_tol – error threshold for the multi-class model
ce_constraint_weight – penalty parameter for multi-class model constraint
out_constraint_weight
lr_lam – learning rate of lagrangian multipliers
penalty_mult – penalty multiplier
constraint_tol – constraint tolerance

forward(logits: Tensor, targets: Tensor, logistic_regression: Callable[[Tensor], Tensor]) → Tensor[source]

Calculates weighted sum of cross-entropy and the energy regularization term a.k.a classical Augmented Lagrangian function

Parameters:

logits – logits
targets – labels
logistic_regression – logistic regression layer

update_hyperparameters(model: Callable[[Tensor], Tensor], train_loader_in: DataLoader, logistic_regression: Callable[[Tensor], Tensor]) → None[source]

Update hyperparameters of the Augmented Lagrangian function

Parameters:

model – pytorch model
train_loader_in – loader of in-distribution data
logistic_regression – logistic regression layer