Training Objectives
Objective (or loss) functions for OOD detection usually aim to improve the discriminability of ID and OOD points in the output or latent space of the model.
All objective functions are implemented as torch.nn.Modules.
Some of them have a set of trainable parameters and must be moved to the appropriate device.
Unsupervised
Unsupervised losses only use in-distribution data (or similarly, only on examples from “known known” classes.)
Therefore, all of these loss functions expect that the target labels are strictly \(\geq 0\).
Deep SVDD Loss
- class pytorch_ood.loss.DeepSVDDLoss(n_dim: int, reduction: str | None = 'mean', radius: float = 0.0, center: Tensor | None = None)[source]
Deep Support Vector Data Description (SVDD) from the paper Deep One-Class Classification. It places a center \(\mu\) in the output space of the model and pulls ID samples towards the sphere with center \(r\) it in order to learn the common factors of intra class variance.
The loss is defined as follows:
\[\mathcal{L}(x) = \max \lbrace 0, \lVert f(x) - \mu \rVert_2^2 - r^2 \rbrace\]The distance of a point to the center can be used as outlier score.
This is an implementation of the One-Class Deep SVDD objective, which implies that the radius is not considered trainable and should usually be set to zero.
In the original paper, the center is initialized with the mean of \(f(x)\) over the dataset before training.
- See Paper:
Note
This module should be moved to the correct device before using
forward()- Parameters:
n_dim – dimensionality \(n\) of the output space
reduction – reduction method to apply, one of
mean,sumornoneradius – radius \(r\)
center – position of the center \(\mu \in \mathbb{R}^n\) where \(n\) is the dimensionality of the output space
- property center: ClassCenters
The center \(\mu\)
- forward(x: Tensor, y: Tensor | None = None) Tensor[source]
- Parameters:
x – features
y – target labels (either ID or OOD). If not given, will assume all samples are IN.
- Returns:
\(\lVert x - \mu \rVert^2 - r^2\)
- radius
radius \(r\) of the hypersphere
- static svdd_loss(x: Tensor, center: ClassCenters, radius: Tensor = 0.0, y: Tensor | None = None) Tensor[source]
Calculates the loss. Treats all ID samples equally, and ignores all OOD samples. If no labels are given, assumes all samples are IN.
- Parameters:
x – features
center – center of sphere
radius – radius of sphere
y – Optional labels.
Class Anchor Clustering Loss
- class pytorch_ood.loss.CACLoss(n_classes: int, magnitude: float = 1.0, alpha: float = 1.0)[source]
Class Anchor Clustering Loss from the paper Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers.
They place a class conditional center (called anchor) in the output space of the model and pull representations of points of a class \(y\) towards the corresponding center \(\mu_y\) during training. The centers are initialized as unit vectors scaled by a magnitude and not trainable.
They also propose an outlier score based on the distance which is implemented in the
CACLoss.score()method.Example code is provided here
Centers are initialized as unit vectors, scaled by the magnitude.
- Parameters:
n_classes – number of classes \(C\)
magnitude – magnitude of class anchors
alpha – \(\alpha\) weight for anchor term
- property centers: ClassCenters
The class centers \(\mu_y\).
- distance(x: Tensor) Tensor[source]
- Parameters:
x – input points
- Returns:
matrix with squared distances from each point to each center with shape \(B \times C\).
II Loss
- class pytorch_ood.loss.IILoss(n_classes: int, n_embedding: int, alpha: float = 1.0)[source]
II Loss function from Learning a neural network based representation for open set recognition.
Warning
We added running centers for online class center estimation. This is only an approximation and results might be different if the centers are actually calculated as described in the paper. However, this enables better estimation of the performance during training, without having calculate the centers over the entire dataset. Empirically, we found that these centers work well.
- Parameters:
n_classes – number of classes
n_embedding – embedding dimensionality
alpha – weight for both loss terms
- property centers: RunningCenters
- Returns:
current class center estimates
- distance(x: Tensor) Tensor[source]
- Parameters:
x – embeddings
- Returns:
distances matrix with distances to class centers in output space
Center Loss
- class pytorch_ood.loss.CenterLoss(n_classes: int, n_dim: int, magnitude: float = 1.0, radius: float = 0.0, fixed: bool = False)[source]
Generalized version of the Center Loss from the Paper A Discriminative Feature Learning Approach for Deep Face Recognition. For each class, this loss places a center \(\mu_y\) in the output space and draws representations of samples to their corresponding class centers, up to a radius \(r\).
Calculates
\[\mathcal{L}(x,y) = \max \lbrace d(f(x),\mu_y) - r , 0 \rbrace\]where \(d\) is some measure of dissimilarity, like the squared distance.
With radius \(r=0\) and the squared euclidean distance as \(d(\cdot,\cdot)\), this is equivalent to the original center loss, which is also referred to as the soft-margin loss in some publications.
- See Implementation:
- See Paper:
- Parameters:
n_classes – number of classes \(C\)
n_dim – dimensionality of center space \(D\)
magnitude – scale \(\lambda\) used for center initialization
radius – radius \(r\) of spheres, lower bound for distance from center that is penalized
fixed – false if centers should be learnable
- property centers: ClassCenters
- Returns:
the \(\mu\) for all classes
Cross-Entropy Loss
Confidence Loss
- class pytorch_ood.loss.ConfidenceLoss(alpha: float = 1.0, eps: float = 1e-24)[source]
Loss proposed in Learning Confidence for Out-of-Distribution Detection in Neural Networks. The models learns to predict a confidence \(c\) in addition to the class membership.
The loss minimized the Negative Log Likelihood for class membership prediction.
\[ \begin{align}\begin{aligned}\mathcal{L}_{NLL} + \alpha \mathcal{L}_c = - \sum_{i=1}^{M} \log(p'_{i}) y_i - \alpha \log(c)\\\text{where} \quad p_i' = c \cdot p_i + (1-c) y_i\end{aligned}\end{align} \]- See Paper:
Note
We implemented clipping for numerical stability.
This implementation uses mean reduction for batches.
The authors additionally used ODIN preprocessing
- Parameters:
alpha – \(\alpha\) used to balance terms
eps – Clipping value \(\epsilon\) used for numerical stability
LogitNorm Loss
- class pytorch_ood.loss.LogitNorm(t=1.0, reduction='mean')[source]
LogitNorm from the paper Mitigating Neural Network Overconfidence with Logit Normalization.
Given a model \(f: \mathcal{X} \rightarrow \mathbb{R}^K\) that maps inputs to \(K\) logits, this method normalizes the logits before computing the negative log-likelihood as:
\[\mathcal{L}(x, y) = -\log \Big( \frac{ \exp( \frac{f(x)_y}{ \tau \lVert x \rVert} )}{\sum_{i=1}^K \exp( \frac{ f(x)_i}{ \tau \lVert x \rVert} ) } \Big)\]where \(\tau\) is a temperature value.
Will ignore OOD inputs.
- See Paper:
- Parameters:
t – temperature \(\tau\).
reduction – reduction method, one of
mean,sumornone
Supervised
Supervised Losses make use from example Out-of-Distribution samples (or samples from known unknown classes). Thus, these losses can handle samples with target values \(< 0\).
Outlier Exposure Loss
- class pytorch_ood.loss.OutlierExposureLoss(alpha: float = 0.5, reduction: str | None = 'mean')[source]
Loss from the paper Deep Anomaly Detection With Outlier Exposure. While the formulation in the original paper is very general, this module implements the exact loss that was used in the corresponding experiments.
The loss is defined as
\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \alpha (\sum_{c=1}^C f(x)_c - \log(\sum_{c=1}^C e^{f(x)_c})) \quad \text{ otherwise } }\]where \(C\) is the number of classes, \(\alpha\) is a hyper parameter, and \(\sigma_y\) denotes the \(y^{th}\) softmax output.
- See Paper:
- See Implementation:
- Parameters:
alpha – weighting coefficient \(\alpha\)
reduction – reduction method, one of
mean,sumornone
Entropic Open-Set Loss
- class pytorch_ood.loss.EntropicOpenSetLoss(reduction: str | None = 'mean')[source]
From the paper Reducing Network Agnostophobia. The loss aims to maximizes the entropy for OOD inputs.
A variant for segmentation was proposed in Entropy Maximization and Meta Classification for Out-Of-Distribution Detection in Semantic Segmentation.
The loss is calculated as
\[\mathcal{L}(x, y) = \Biggl \lbrace { -\log \sigma_y(f(x)) \quad \text{if } y \geq 0 \atop \frac{1}{C} \sum_{c=1}^C \log \sigma_c(f(x)) \quad \text{ otherwise } }\]where \(\sigma\) is the softmax function and \(C\) is the number of classes.
- See Paper:
- See Paper:
- Parameters:
reduction – reduction method, one of
mean,sumornone
Objectosphere Loss
- class pytorch_ood.loss.ObjectosphereLoss(alpha: float = 1.0, xi: float = 1.0, reduction: str | None = 'mean')[source]
From the paper Reducing Network Agnostophobia.
\[\mathcal{L}(x, y) = \mathcal{L}_E(x,y) + \alpha \Biggl \lbrace { \max \lbrace 0, \xi - \lVert F(x) \rVert \rbrace^2 \quad \text{if } y \geq 0 \atop \lVert F(x) \rVert_2^2 \hspace{3.7cm} \text{ otherwise } }\]where \(F(x)\) are deep features in some layer of the model, and \(\mathcal{L}_E\) is the Entropic Open-Set Loss.
- See Paper:
- Parameters:
alpha – weight coefficient
xi – minimum feature magnitude \(\xi\)
Energy-Bounded Learning Loss
- class pytorch_ood.loss.EnergyRegularizedLoss(alpha: float = 1.0, margin_in: float = -1.0, margin_out: float = -1.0, reduction: str = 'mean')[source]
Augments the cross-entropy by a regularization term that aims to increase the energy gap between ID and OOD samples. This term is defined as
\[\mathcal{L}(x, y) = \alpha \Biggl \lbrace { \max(0, E(x) - m_{in})^2 \quad \quad \quad \quad \quad \quad \text{if } y \geq 0 \atop \max(0, m_{out} - E(x))^2 \quad \quad \quad \quad \quad \text{ otherwise } }\]where \(E(x) = - \log(\sum_i e^{f_i(x)} )\) is the energy of \(x\).
- See Paper:
- See Implementation:
- Parameters:
alpha – weighting parameter
margin_in – margin energy \(m_{in}\) for ID data
margin_out – margin energy \(m_{out}\) for OOD data
reduction – can be one of
none,mean,sum
VOS Energy-Based Loss
- class pytorch_ood.loss.VOSRegLoss(logistic_regression: Linear, weights_energy: Linear, alpha: float = 0.1, device: str = 'cpu', reduction: str = 'mean')[source]
Implements the loss function from VOS: Learning what you don’t know by virtual outlier synthesis without the synthesising of virtual outliers. The loss adds a regularization term to the cross-entropy that aims to increase the (weighted) energy gap between ID and OOD samples.
The regularization term is defined as:
\[\mathcal{L} = \mathbb{E}_{v \sim V} \left[ -\text {log}\frac{1}{1+\text{exp}^{-\phi(E(v))}} \right] + \mathbb{E}_{x \sim D} \left[ -\text {log} \frac{\text{exp}^{-\phi(E(x))}}{1+ \text{exp}^{-\phi(E(x))}}\right]\]where \(\phi\) is a possibly non-linear function, \(E\) is the weighted energy and \(V\) and \(D\) are the distributions of the (possibly virtual) outliers and the ID data respectively.
For initialisation of \(\phi\) and the weights for weighted energy:
phi = torch.nn.Linear(1, 2) weights = torch.nn.Linear(num_classes, 1) torch.nn.init.uniform_(weights.weight) criterion = VOSRegLoss(phi, weights)
Note
This implementation does not generate synthetic outliers. For this feature, see
pytorch_ood.loss.vos.VirtualOutlierSynthesizingRegLoss.- Parameters:
logistic_regression – \(\phi\) function. Can be for example a linear layer.
weights_energy – neural network layer with weights for the energy
alpha – weighting parameter \(\alpha\).
reduction – reduction method to apply, one of
mean,sumornonedevice – For example
cpuorcuda:0
Virtual Outlier Synthesizing Loss
- class pytorch_ood.loss.VirtualOutlierSynthesizingRegLoss(logistic_regression: Linear, weights_energy: Linear, device: str, num_classes: int, num_input_last_layer: int, fc: Linear, alpha: float = 0.1, reduction: str = 'mean', sample_number: int = 1000, select: int = 1, sample_from: int = 10000)[source]
Implements the loss function of VOS: Learning what you don’t know by virtual outlier synthesis with additional sampling of virtual outliers. These outliers are synthesized by fitting a gaussian to the latent features and sampling from low-likelihood regions. This alleviates the need for real outliers during training.
For more information see
VOS Energy-Based Loss.- See Paper:
- See Implementation:
- Parameters:
logistic_regression – \(\phi\) function. Can be for example a linear layer.
weights_energy – neural network layer, with weights for the energy
device – For example
cpuorcuda:0num_classes – number of classes
num_input_last_layer – number of inputs in the last layer of the network
fc – fully connected last layer of the network
alpha – weighting parameter
reduction – reduction method to apply, one of
mean,sumornonesample_number – number of samples that are used for virtual outlier synthesis
select – number of highest density samples that are used for virtual outlier synthesis
sample_from – number of samples that are used for sampling the probability distribution
MCHAD Loss
- class pytorch_ood.loss.MCHADLoss(n_classes: int, n_dim: int, radius: float = 0, margin: float = 0, weight_center: float = 1.0, weight_nll: float = 1.0, weight_oe: float = 1.0)[source]
Implements the MCHAD loss from the Paper Multi-Class Hypersphere Anomaly-Detection.
The Loss places a center \(\mu_y\) for each class \(y\) in the output space of the model and has three components:
\[ \begin{align}\begin{aligned}\mathcal{L}_{\Lambda}(x,y) = \max \lbrace 0, \Vert \mu_y - f(x)_y \Vert^2_2 - r^2 \rbrace\\\mathcal{L}_{\Delta}(x,y) = \log(1 + \sum_{i \neq y} e^{\Vert \mu_y - f(x)_y \Vert^2_2 - \Vert \mu_y - f(x)_i \Vert^2_2} )\\\mathcal{L}_{\Theta}(x) = \sum_i \max \lbrace 0, (r + m)^2 - \Vert f(x) - \mu_y \Vert^2 \rbrace\end{aligned}\end{align} \]Intuitively, the first term forces the samples to cluster tightly in a sphere of radius \(r\) around the corresponding class centers. The second term ensures that the (learnable) class centers remain separable and do not collapse. The third term makes sure that OOD samples have at least a distance \(m\) to the surface of each hypersphere.
The loss can be used in a supervised, as well as in an unsupervised manner.
- See Implementation:
- See Paper:
- Parameters:
n_classes – number of classes \(C\)
n_dim – dimensionality of the output space \(D\)
radius – radius of the hyperspheres
margin – margin around hyperspheres
weight_center – weight \(\lambda_{\Lambda}\) for the center loss term
weight_nll – weight \(\lambda_{\Delta}\) for the maximum likelihood term
weight_oe – weight \(\lambda_{\Theta}\) for the outlier exposure term
- property centers: ClassCenters
Class centers \(\mu_y\)
Background Class Loss
- class pytorch_ood.loss.BackgroundClassLoss(n_classes: int, reduction: str = 'mean')[source]
The idea of the background-class is that OOD samples are mapped to an individual class during training. This implementation uses the normal cross-entropy, but handles remapping of the background class labels to positive target labels. Thus, when the target labels are \(\lbrace 0, 1, 2, ..., N - 1 \rbrace\) we will remap all entries with target label \(<0\) to \(N\).
The networks output layer has to include \(N+1\) outputs, so logits are in the shape \(B \times (N + 1)\).
- Parameters:
n_classes – number of classes \(N\) (not counting background class)
reduction – can be one of
none,mean,sum
Energy Margin Loss (Scone)
- class pytorch_ood.loss.EnergyMarginLoss(full_train_loss: floating, eta=1.0, false_alarm_cutoff=0.05, in_constraint_weight=1.0, ce_tol=2.0, ce_constraint_weight=1.0, out_constraint_weight=1.0, lr_lam=1.0, penalty_mult=1.5, constraint_tol=0.0)[source]
Loss from the paper Feed Two Birds with One Scone. Introducing a margin to further improve performance Energy-based OOD detection method, specifically for handling covariate shifted data.
Constructor of EnergyMarginLoss
- Parameters:
full_train_loss – average classification loss of pre-trained model
eta – margin between ID and OOD; Covariate-shifted data should reside in-between
false_alarm_cutoff – false alarm cutoff
in_constraint_weight – penalty parameter for in-distribution constraint
lam – lagrangian multiplier for in-distribution constraint
lam2 – lagrangian multiplier for multi-class model constraint
ce_tol – error threshold for the multi-class model
ce_constraint_weight – penalty parameter for multi-class model constraint
out_constraint_weight
lr_lam – learning rate of lagrangian multipliers
penalty_mult – penalty multiplier
constraint_tol – constraint tolerance
- forward(logits: Tensor, targets: Tensor, logistic_regression: Callable[[Tensor], Tensor]) Tensor[source]
Calculates weighted sum of cross-entropy and the energy regularization term a.k.a classical Augmented Lagrangian function
- Parameters:
logits – logits
targets – labels
logistic_regression – logistic regression layer
- update_hyperparameters(model: Callable[[Tensor], Tensor], train_loader_in: DataLoader, logistic_regression: Callable[[Tensor], Tensor]) None[source]
Update hyperparameters of the Augmented Lagrangian function
- Parameters:
model – pytorch model
train_loader_in – loader of in-distribution data
logistic_regression – logistic regression layer