Training Objectives

Objective (or loss) functions for OOD detection usually aim to improve the discriminability of ID and OOD points in the output or latent space of the model.

All objective functions are implemented as torch.nn.Modules. Some of them have a set of trainable parameters and must be moved to the appropriate device.

Unsupervised

Unsupervised losses only use in-distribution data (or similarly, only on examples from “known known” classes.)

Therefore, all of these loss functions expect that the target labels are strictly \(\geq 0\).

Deep SVDD Loss

classification badge segmentation badge
class pytorch_ood.loss.DeepSVDDLoss(n_dim: int, reduction: str | None = 'mean', radius: float = 0.0, center: Tensor | None = None)[source]

Deep Support Vector Data Description (SVDD) from the paper Deep One-Class Classification. It places a center \(\mu\) in the output space of the model and pulls ID samples towards the sphere with center \(r\) it in order to learn the common factors of intra class variance.

The loss is defined as follows:

\[\mathcal{L}(x) = \max \lbrace 0, \lVert f(x) - \mu \rVert_2^2 - r^2 \rbrace\]

The distance of a point to the center can be used as outlier score.

This is an implementation of the One-Class Deep SVDD objective, which implies that the radius is not considered trainable and should usually be set to zero.

In the original paper, the center is initialized with the mean of \(f(x)\) over the dataset before training.

See Paper:

ICML

Note

This module should be moved to the correct device before using forward()

Parameters:
  • n_dim – dimensionality \(n\) of the output space

  • reduction – reduction method to apply, one of mean, sum or none

  • radius – radius \(r\)

  • center – position of the center \(\mu \in \mathbb{R}^n\) where \(n\) is the dimensionality of the output space

property center: ClassCenters

The center \(\mu\)

distance(x: Tensor) Tensor[source]
Returns:

calculates \(\lVert x - \mu \rVert^2 - r^2\)

forward(x: Tensor, y: Tensor | None = None) Tensor[source]
Parameters:
  • x – features

  • y – target labels (either ID or OOD). If not given, will assume all samples are IN.

Returns:

\(\lVert x - \mu \rVert^2 - r^2\)

radius

radius \(r\) of the hypersphere

static svdd_loss(x: Tensor, center: ClassCenters, radius: Tensor = 0.0, y: Tensor | None = None) Tensor[source]

Calculates the loss. Treats all ID samples equally, and ignores all OOD samples. If no labels are given, assumes all samples are IN.

Parameters:
  • x – features

  • center – center of sphere

  • radius – radius of sphere

  • y – Optional labels.

Class Anchor Clustering Loss

classification badge segmentation badge
class pytorch_ood.loss.CACLoss(n_classes: int, magnitude: float = 1.0, alpha: float = 1.0)[source]

Class Anchor Clustering Loss from the paper Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers.

They place a class conditional center (called anchor) in the output space of the model and pull representations of points of a class \(y\) towards the corresponding center \(\mu_y\) during training. The centers are initialized as unit vectors scaled by a magnitude and not trainable.

They also propose an outlier score based on the distance which is implemented in the CACLoss.score() method.

Example code is provided here

See Paper:

WACV 2022

See Implementation:

GitHub

Centers are initialized as unit vectors, scaled by the magnitude.

Parameters:
  • n_classes – number of classes \(C\)

  • magnitude – magnitude of class anchors

  • alpha\(\alpha\) weight for anchor term

property centers: ClassCenters

The class centers \(\mu_y\).

distance(x: Tensor) Tensor[source]
Parameters:

x – input points

Returns:

matrix with squared distances from each point to each center with shape \(B \times C\).

forward(distances: Tensor, target: Tensor) Tensor[source]

Calculates the CAC loss, based on the given distance matrix and target labels. OOD inputs will be ignored.

Parameters:
  • distances – matrix of distances of each point to each center with shape \(B \times C\).

  • target – labels for samples

static score(distance: Tensor) Tensor[source]

Rejection score proposed in the paper.

Parameters:

distance – distance of instances to class centers

Returns:

outlier scores

II Loss

classification badge segmentation badge
class pytorch_ood.loss.IILoss(n_classes: int, n_embedding: int, alpha: float = 1.0)[source]

II Loss function from Learning a neural network based representation for open set recognition.

See Paper:

ArXiv

See Implementation:

GitHub

Warning

  • We added running centers for online class center estimation. This is only an approximation and results might be different if the centers are actually calculated as described in the paper. However, this enables better estimation of the performance during training, without having calculate the centers over the entire dataset. Empirically, we found that these centers work well.

Parameters:
  • n_classes – number of classes

  • n_embedding – embedding dimensionality

  • alpha – weight for both loss terms

property centers: RunningCenters
Returns:

current class center estimates

distance(x: Tensor) Tensor[source]
Parameters:

x – embeddings

Returns:

distances matrix with distances to class centers in output space

forward(x: Tensor, target: Tensor) Tensor[source]

Updates running centers

Parameters:
  • x – embeddings of samples

  • target – label of samples

predict(x: Tensor) Tensor[source]

Predict class membership probability

Parameters:

x – embeddings

Returns:

class membership probabilities

Center Loss

classification badge segmentation badge
class pytorch_ood.loss.CenterLoss(n_classes: int, n_dim: int, magnitude: float = 1.0, radius: float = 0.0, fixed: bool = False)[source]

Generalized version of the Center Loss from the Paper A Discriminative Feature Learning Approach for Deep Face Recognition. For each class, this loss places a center \(\mu_y\) in the output space and draws representations of samples to their corresponding class centers, up to a radius \(r\).

Calculates

\[\mathcal{L}(x,y) = \max \lbrace d(f(x),\mu_y) - r , 0 \rbrace\]

where \(d\) is some measure of dissimilarity, like the squared distance.

With radius \(r=0\) and the squared euclidean distance as \(d(\cdot,\cdot)\), this is equivalent to the original center loss, which is also referred to as the soft-margin loss in some publications.

See Implementation:

GitHub

See Paper:

ECCV 2016

Parameters:
  • n_classes – number of classes \(C\)

  • n_dim – dimensionality of center space \(D\)

  • magnitude – scale \(\lambda\) used for center initialization

  • radius – radius \(r\) of spheres, lower bound for distance from center that is penalized

  • fixed – false if centers should be learnable

property centers: ClassCenters
Returns:

the \(\mu\) for all classes

forward(distmat: Tensor, target: Tensor) Tensor[source]

Calculates the loss. Ignores OOD inputs.

Parameters:
  • distmat – matrix of distances of each point to each center with shape \(B \times C\).

  • target – ground truth labels with shape (batch_size).

Returns:

the loss values

Cross-Entropy Loss

classification badge segmentation badge
class pytorch_ood.loss.CrossEntropyLoss(reduction: str | None = 'mean')[source]

Standard Cross-entropy, but ignores OOD inputs.

Parameters:

reduction – reduction method to apply. Can be one of mean, sum or none

forward(logits: Tensor, targets: Tensor) Tensor[source]

Calculates cross-entropy.

Parameters:
  • logits – logits

  • targets – labels

Confidence Loss

classification badge segmentation badge
class pytorch_ood.loss.ConfidenceLoss(alpha: float = 1.0, eps: float = 1e-24)[source]

Loss proposed in Learning Confidence for Out-of-Distribution Detection in Neural Networks. The models learns to predict a confidence \(c\) in addition to the class membership.

The loss minimized the Negative Log Likelihood for class membership prediction.

\[ \begin{align}\begin{aligned}\mathcal{L}_{NLL} + \alpha \mathcal{L}_c = - \sum_{i=1}^{M} \log(p'_{i}) y_i - \alpha \log(c)\\\text{where} \quad p_i' = c \cdot p_i + (1-c) y_i\end{aligned}\end{align} \]
See Paper:

ArXiv

Note

  • We implemented clipping for numerical stability.

  • This implementation uses mean reduction for batches.

  • The authors additionally used ODIN preprocessing

Parameters:
  • alpha\(\alpha\) used to balance terms

  • eps – Clipping value \(\epsilon\) used for numerical stability

forward(logits: Tensor, confidence: Tensor, target: Tensor) Tensor[source]
Parameters:
  • logits – class logits for samples

  • confidence – predicted confidence for samples

  • target – labels for samples (not one-hot encoded)

LogitNorm Loss

classification badge segmentation badge
class pytorch_ood.loss.LogitNorm(t=1.0, reduction='mean')[source]

LogitNorm from the paper Mitigating Neural Network Overconfidence with Logit Normalization.

Given a model \(f: \mathcal{X} \rightarrow \mathbb{R}^K\) that maps inputs to \(K\) logits, this method normalizes the logits before computing the negative log-likelihood as:

\[\mathcal{L}(x, y) = -\log \Big( \frac{ \exp( \frac{f(x)_y}{ \tau \lVert x \rVert} )}{\sum_{i=1}^K \exp( \frac{ f(x)_i}{ \tau \lVert x \rVert} ) } \Big)\]

where \(\tau\) is a temperature value.

Will ignore OOD inputs.

See Paper:

ICML

Parameters:
  • t – temperature \(\tau\).

  • reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) Tensor[source]
Parameters:
  • logits – logits as predicted by the model

  • target – labels

Supervised

Supervised Losses make use from example Out-of-Distribution samples (or samples from known unknown classes). Thus, these losses can handle samples with target values \(< 0\).

Outlier Exposure Loss

classification badge segmentation badge
class pytorch_ood.loss.OutlierExposureLoss(alpha: float = 0.5, reduction: str | None = 'mean')[source]

Loss from the paper Deep Anomaly Detection With Outlier Exposure. While the formulation in the original paper is very general, this module implements the exact loss that was used in the corresponding experiments.

The loss is defined as

\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \alpha (\sum_{c=1}^C f(x)_c - \log(\sum_{c=1}^C e^{f(x)_c})) \quad \text{ otherwise } }\]

where \(C\) is the number of classes, \(\alpha\) is a hyper parameter, and \(\sigma_y\) denotes the \(y^{th}\) softmax output.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:
  • alpha – weighting coefficient \(\alpha\)

  • reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) Tensor[source]
Parameters:
  • logits – class logits for predictions

  • target – labels for predictions

Returns:

loss

Entropic Open-Set Loss

classification badge segmentation badge
class pytorch_ood.loss.EntropicOpenSetLoss(reduction: str | None = 'mean')[source]

From the paper Reducing Network Agnostophobia. The loss aims to maximizes the entropy for OOD inputs.

A variant for segmentation was proposed in Entropy Maximization and Meta Classification for Out-Of-Distribution Detection in Semantic Segmentation.

The loss is calculated as

\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \text{if } y \geq 0 \atop \frac{1}{C} \sum_{c=1}^C \log \sigma_c(f(x)) \quad \text{ otherwise } }\]

where \(\sigma\) is the softmax function and \(C\) is the number of classes.

See Paper:

NeurIPS

See Paper:

ArXiv

Parameters:

reduction – reduction method, one of mean, sum or none

forward(logits: Tensor, target: Tensor) Tensor[source]
Parameters:
  • logits – class logits

  • target – target labels

Returns:

the loss

Objectosphere Loss

classification badge segmentation badge
class pytorch_ood.loss.ObjectosphereLoss(alpha: float = 1.0, xi: float = 1.0, reduction: str | None = 'mean')[source]

From the paper Reducing Network Agnostophobia.

\[\mathcal{L}(x, y) = \mathcal{L}_E(x,y) + \alpha \Biggl \lbrace { \max \lbrace 0, \xi - \lVert F(x) \rVert \rbrace^2 \quad \text{if } y \geq 0 \atop \lVert F(x) \rVert_2^2 \hspace{3.7cm} \text{ otherwise } }\]

where \(F(x)\) are deep features in some layer of the model, and \(\mathcal{L}_E\) is the Entropic Open-Set Loss.

See Paper:

NeurIPS

Parameters:
  • alpha – weight coefficient

  • xi – minimum feature magnitude \(\xi\)

forward(logits: Tensor, features: Tensor, target: Tensor) Tensor[source]
Parameters:
  • logits – class logits \(f(x)\)

  • features – deep features \(F(x)\)

  • target – target labels \(y\)

Returns:

the loss

static score(logits: Tensor) Tensor[source]

Outlier score used by the objectosphere loss.

Parameters:

logits – instance logits

Returns:

outlier scores

Energy-Bounded Learning Loss

classification badge segmentation badge
class pytorch_ood.loss.EnergyRegularizedLoss(alpha: float = 1.0, margin_in: float = -1.0, margin_out: float = -1.0, reduction: str = 'mean')[source]

Augments the cross-entropy by a regularization term that aims to increase the energy gap between ID and OOD samples. This term is defined as

\[\mathcal{L}(x, y) = \alpha \Biggl \lbrace { \max(0, E(x) - m_{in})^2 \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \max(0, m_{out} - E(x))^2 \quad \quad \quad \quad \quad \text{ otherwise } }\]

where \(E(x) = - \log(\sum_i e^{f_i(x)} )\) is the energy of \(x\).

See Paper:

NeurIPS

See Implementation:

GitHub

Parameters:
  • alpha – weighting parameter

  • margin_in – margin energy \(m_{in}\) for ID data

  • margin_out – margin energy \(m_{out}\) for OOD data

  • reduction – can be one of none, mean, sum

forward(logits: Tensor, targets: Tensor) Tensor[source]

Calculates weighted sum of cross-entropy and the energy regularization term.

Parameters:
  • logits – logits

  • targets – labels

VOS Energy-Based Loss

classification badge segmentation badge
class pytorch_ood.loss.VOSRegLoss(logistic_regression: Linear, weights_energy: Linear, alpha: float = 0.1, device: str = 'cpu', reduction: str = 'mean')[source]

Implements the loss function from VOS: Learning what you don’t know by virtual outlier synthesis without the synthesising of virtual outliers. The loss adds a regularization term to the cross-entropy that aims to increase the (weighted) energy gap between ID and OOD samples.

The regularization term is defined as:

\[\mathcal{L} = \mathbb{E}_{v \sim V} \left[ -\text {log}\frac{1}{1+\text{exp}^{-\phi(E(v))}} \right] + \mathbb{E}_{x \sim D} \left[ -\text {log} \frac{\text{exp}^{-\phi(E(x))}}{1+ \text{exp}^{-\phi(E(x))}}\right]\]

where \(\phi\) is a possibly non-linear function, \(E\) is the weighted energy and \(V\) and \(D\) are the distributions of the (possibly virtual) outliers and the ID data respectively.

See Paper:

ArXiv

See Implementation:

GitHub

For initialisation of \(\phi\) and the weights for weighted energy:

phi = torch.nn.Linear(1, 2)
weights = torch.nn.Linear(num_classes, 1)
torch.nn.init.uniform_(weights.weight)
criterion = VOSRegLoss(phi, weights)

Note

This implementation does not generate synthetic outliers. For this feature, see pytorch_ood.loss.vos.VirtualOutlierSynthesizingRegLoss.

Parameters:
  • logistic_regression\(\phi\) function. Can be for example a linear layer.

  • weights_energy – neural network layer with weights for the energy

  • alpha – weighting parameter \(\alpha\).

  • reduction – reduction method to apply, one of mean, sum or none

  • device – For example cpu or cuda:0

forward(logits: Tensor, y: Tensor) Tensor[source]
Parameters:
  • logits – logits

  • y – labels

Virtual Outlier Synthesizing Loss

classification badge segmentation badge
class pytorch_ood.loss.VirtualOutlierSynthesizingRegLoss(logistic_regression: Linear, weights_energy: Linear, device: str, num_classes: int, num_input_last_layer: int, fc: Linear, alpha: float = 0.1, reduction: str = 'mean', sample_number: int = 1000, select: int = 1, sample_from: int = 10000)[source]

Implements the loss function of VOS: Learning what you don’t know by virtual outlier synthesis with additional sampling of virtual outliers. These outliers are synthesized by fitting a gaussian to the latent features and sampling from low-likelihood regions. This alleviates the need for real outliers during training.

For more information see VOS Energy-Based Loss.

See Paper:

ArXiv

See Implementation:

GitHub

Parameters:
  • logistic_regression\(\phi\) function. Can be for example a linear layer.

  • weights_energy – neural network layer, with weights for the energy

  • device – For example cpu or cuda:0

  • num_classes – number of classes

  • num_input_last_layer – number of inputs in the last layer of the network

  • fc – fully connected last layer of the network

  • alpha – weighting parameter

  • reduction – reduction method to apply, one of mean, sum or none

  • sample_number – number of samples that are used for virtual outlier synthesis

  • select – number of highest density samples that are used for virtual outlier synthesis

  • sample_from – number of samples that are used for sampling the probability distribution

forward(logits: Tensor, features: Tensor, y: Tensor)[source]
Parameters:
  • logits – logits

  • features – features

  • y – labels

MCHAD Loss

classification badge segmentation badge
class pytorch_ood.loss.MCHADLoss(n_classes: int, n_dim: int, radius: float = 0, margin: float = 0, weight_center: float = 1.0, weight_nll: float = 1.0, weight_oe: float = 1.0)[source]

Implements the MCHAD loss from the Paper Multi-Class Hypersphere Anomaly-Detection.

The Loss places a center \(\mu_y\) for each class \(y\) in the output space of the model and has three components:

\[ \begin{align}\begin{aligned}\mathcal{L}_{\Lambda}(x,y) = \max \lbrace 0, \Vert \mu_y - f(x)_y \Vert^2_2 - r^2 \rbrace\\\mathcal{L}_{\Delta}(x,y) = \log(1 + \sum_{i \neq y} e^{\Vert \mu_y - f(x)_y \Vert^2_2 - \Vert \mu_y - f(x)_i \Vert^2_2} )\\\mathcal{L}_{\Theta}(x) = \sum_i \max \lbrace 0, (r + m)^2 - \Vert f(x) - \mu_y \Vert^2 \rbrace\end{aligned}\end{align} \]

Intuitively, the first term forces the samples to cluster tightly in a sphere of radius \(r\) around the corresponding class centers. The second term ensures that the (learnable) class centers remain separable and do not collapse. The third term makes sure that OOD samples have at least a distance \(m\) to the surface of each hypersphere.

The loss can be used in a supervised, as well as in an unsupervised manner.

See Implementation:

GitLab

See Paper:

ICPR

Parameters:
  • n_classes – number of classes \(C\)

  • n_dim – dimensionality of the output space \(D\)

  • radius – radius of the hyperspheres

  • margin – margin around hyperspheres

  • weight_center – weight \(\lambda_{\Lambda}\) for the center loss term

  • weight_nll – weight \(\lambda_{\Delta}\) for the maximum likelihood term

  • weight_oe – weight \(\lambda_{\Theta}\) for the outlier exposure term

property centers: ClassCenters

Class centers \(\mu_y\)

distance(z: Tensor) Tensor[source]

Calculates the distance of each embedding to each center.

Parameters:

z – embeddings of shape \(B \times D\).

Returns:

distance matrix of shape \(B \times C\).

forward(distmat: Tensor, y: Tensor) Tensor[source]
Parameters:
  • distmat – distance matrix shape \(B \times C\).

  • y – labels

Returns:

loss values

Background Class Loss

classification badge segmentation badge
class pytorch_ood.loss.BackgroundClassLoss(n_classes: int, reduction: str = 'mean')[source]

The idea of the background-class is that OOD samples are mapped to an individual class during training. This implementation uses the normal cross-entropy, but handles remapping of the background class labels to positive target labels. Thus, when the target labels are \(\lbrace 0, 1, 2, ..., N - 1 \rbrace\) we will remap all entries with target label \(<0\) to \(N\).

The networks output layer has to include \(N+1\) outputs, so logits are in the shape \(B \times (N + 1)\).

Parameters:
  • n_classes – number of classes \(N\) (not counting background class)

  • reduction – can be one of none, mean, sum

forward(logits: Tensor, targets: Tensor) Tensor[source]
Parameters:
  • logits – class logits

  • targets – target labels

Returns:

Cross-Entropy for remapped samples

Energy Margin Loss (Scone)

classification badge segmentation badge
class pytorch_ood.loss.EnergyMarginLoss(full_train_loss: floating, eta=1.0, false_alarm_cutoff=0.05, in_constraint_weight=1.0, ce_tol=2.0, ce_constraint_weight=1.0, out_constraint_weight=1.0, lr_lam=1.0, penalty_mult=1.5, constraint_tol=0.0)[source]

Loss from the paper Feed Two Birds with One Scone. Introducing a margin to further improve performance Energy-based OOD detection method, specifically for handling covariate shifted data.

See Paper:

ArXiv

See Implementation:

GitHub

See Derivation:

ArXiv

Constructor of EnergyMarginLoss

Parameters:
  • full_train_loss – average classification loss of pre-trained model

  • eta – margin between ID and OOD; Covariate-shifted data should reside in-between

  • false_alarm_cutoff – false alarm cutoff

  • in_constraint_weight – penalty parameter for in-distribution constraint

  • lam – lagrangian multiplier for in-distribution constraint

  • lam2 – lagrangian multiplier for multi-class model constraint

  • ce_tol – error threshold for the multi-class model

  • ce_constraint_weight – penalty parameter for multi-class model constraint

  • out_constraint_weight

  • lr_lam – learning rate of lagrangian multipliers

  • penalty_mult – penalty multiplier

  • constraint_tol – constraint tolerance

forward(logits: Tensor, targets: Tensor, logistic_regression: Callable[[Tensor], Tensor]) Tensor[source]

Calculates weighted sum of cross-entropy and the energy regularization term a.k.a classical Augmented Lagrangian function

Parameters:
  • logits – logits

  • targets – labels

  • logistic_regression – logistic regression layer

update_hyperparameters(model: Callable[[Tensor], Tensor], train_loader_in: DataLoader, logistic_regression: Callable[[Tensor], Tensor]) None[source]

Update hyperparameters of the Augmented Lagrangian function

Parameters:
  • model – pytorch model

  • train_loader_in – loader of in-distribution data

  • logistic_regression – logistic regression layer