Benchmarks

Benchmark objects aim to provide a higher level interface to recreate the OOD detection benchmarks used in the literature.

API

Each benchmark implements a common interface.

Note

This is currently a draft and likely subject to change in the future.

benchmark = Benchmark(root)
detector = Detector(model)
detector.fit(benchmark.train_set())

results1 = benchmark.evaluate(detector1)
results2 = benchmark.evaluate(detector2)
class pytorch_ood.benchmark.Benchmark[source]

Base class for Benchmarks

abstract evaluate(detector: Detector, *args, **kwargs) List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

abstract test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

abstract train_set() Dataset[source]

Training dataset

Image

Examples can be found here

CIFAR 10

ODIN Benchmark

class pytorch_ood.benchmark.CIFAR10_ODIN(root, transform)[source]

Replicates the OOD detection benchmark from the ODIN paper for CIFAR 10.

See Paper:

ArXiv

Outlier datasets are

  • TinyImageNetCrop

  • TinyImageNetResize

  • LSUNResize

  • LSUNCrop

  • Uniform

  • Gaussian

Parameters:
  • root – where to store datasets

  • transform – transform to apply to images

evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

Parameters:
  • detector – the detector to evaluate

  • loader_kwargs – keyword arguments to give to the data loader

  • device – the device to move batches to

ood_names: List[str]

OOD Dataset names

test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

train_set() Dataset[source]

Training dataset

OpenOOD Benchmark

class pytorch_ood.benchmark.CIFAR10_OpenOOD(root, transform)[source]

Aims to replicate the benchmark proposed in OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.

See Paper:

OpenOOD

Outlier datasets are

  • CIFAR100

  • TinyImageNet

  • MNIST

  • FashionMNIST

  • Textures

  • Places365

Warning

This currently does not reproduce the benchmark accurately, as it does not exclude images with overlap with CIFAR10.

Parameters:
  • root – where to store datasets

  • transform – transform to apply to images

evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

Parameters:
  • detector – the detector to evaluate

  • loader_kwargs – keyword arguments to give to the data loader

  • device – the device to move batches to

ood_names: List[str]

OOD Dataset names

test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

train_set() Dataset[source]

Training dataset

CIFAR 100

ODIN Benchmark

class pytorch_ood.benchmark.CIFAR100_ODIN(root, transform)[source]

Replicates the OOD detection benchmark from the ODIN paper for CIFAR 100.

See Paper:

ArXiv

Outlier datasets are

  • TinyImageNetCrop

  • TinyImageNetResize

  • LSUNResize

  • LSUNCrop

  • Uniform

  • Gaussian

Parameters:
  • root – where to store datasets

  • transform – transform to apply to images

evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

Parameters:
  • detector – the detector to evaluate

  • loader_kwargs – keyword arguments to give to the data loader

  • device – the device to move batches to

ood_names: List[str]

OOD Dataset names

test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

train_set() Dataset[source]

Training dataset

OpenOOD Benchmark

class pytorch_ood.benchmark.CIFAR100_OpenOOD(root, transform)[source]

Aims to replicate the benchmark proposed in OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.

See Paper:

OpenOOD

Outlier datasets are

  • CIFAR10

  • TinyImageNet

  • MNIST

  • FashionMNIST

  • Textures

  • Places365

Warning

This currently does not reproduce the benchmark accurately, as it does not exclude images with overlap with CIFAR100.

Parameters:
  • root – where to store datasets

  • transform – transform to apply to images

evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

Parameters:
  • detector – the detector to evaluate

  • loader_kwargs – keyword arguments to give to the data loader

  • device – the device to move batches to

ood_names: List[str]

OOD Dataset names

test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

train_set() Dataset[source]

Training dataset

ImageNet

OpenOOD Benchmark

class pytorch_ood.benchmark.ImageNet_OpenOOD(root, image_net_root, transform)[source]

Aims to replicate the ImageNet benchmark proposed in OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.

See Paper:

OpenOOD

Outlier datasets are

  • ImageNet-O

  • OpenImage-O

  • Textures

  • MNIST

  • SVHN

  • Texture

Warning

This currently does not reproduce the benchmark accurately, as it does not exclude images with overlap with ImageNet and is missing the Species dataset.

Parameters:
  • root – where to store datasets

  • image_net_root – root for the ImageNet dataset

  • transform – transform to apply to images

evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]

Evaluates the given detector on all datasets and returns a list with the results

Parameters:
  • detector – the detector to evaluate

  • loader_kwargs – keyword arguments to give to the data loader

  • device – the device to move batches to

ood_names: List[str]

OOD Dataset names

test_sets(known=True, unknown=True) List[Dataset][source]

List of the different test datasets. If known and unknown are true, each dataset contains ID and OOD data.

Parameters:
  • known – include ID

  • unknown – include OOD

train_set() Dataset[source]

Training dataset