Benchmarks
Benchmark objects aim to provide a higher level interface to recreate the OOD detection benchmarks used in the literature.
API
Each benchmark implements a common interface.
Note
This is currently a draft and likely subject to change in the future.
benchmark = Benchmark(root)
detector = Detector(model)
detector.fit(benchmark.train_set())
results1 = benchmark.evaluate(detector1)
results2 = benchmark.evaluate(detector2)
- class pytorch_ood.benchmark.Benchmark[source]
Base class for Benchmarks
- abstract evaluate(detector: Detector, *args, **kwargs) List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
Image
Examples can be found here
CIFAR 10
ODIN Benchmark
- class pytorch_ood.benchmark.CIFAR10_ODIN(root, transform)[source]
Replicates the OOD detection benchmark from the ODIN paper for CIFAR 10.
- See Paper:
Outlier datasets are
TinyImageNetCrop
TinyImageNetResize
LSUNResize
LSUNCrop
Uniform
Gaussian
- Parameters:
root – where to store datasets
transform – transform to apply to images
- evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
- Parameters:
detector – the detector to evaluate
loader_kwargs – keyword arguments to give to the data loader
device – the device to move batches to
- ood_names: List[str]
OOD Dataset names
OpenOOD Benchmark
- class pytorch_ood.benchmark.CIFAR10_OpenOOD(root, transform)[source]
Replicates the CIFAR-10 benchmark proposed in OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection.
- See Paper:
Near-OOD datasets:
CIFAR-100
TinyImageNet
Far-OOD datasets:
MNIST
SVHN
Textures
Places365
- Parameters:
root – where to store datasets
transform – transform to apply to images
- evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
- Parameters:
detector – the detector to evaluate
loader_kwargs – keyword arguments to give to the data loader
device – the device to move batches to
- ood_names: List[str]
OOD Dataset names
CIFAR 100
ODIN Benchmark
- class pytorch_ood.benchmark.CIFAR100_ODIN(root, transform)[source]
Replicates the OOD detection benchmark from the ODIN paper for CIFAR 100.
- See Paper:
Outlier datasets are
TinyImageNetCrop
TinyImageNetResize
LSUNResize
LSUNCrop
Uniform
Gaussian
- Parameters:
root – where to store datasets
transform – transform to apply to images
- evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
- Parameters:
detector – the detector to evaluate
loader_kwargs – keyword arguments to give to the data loader
device – the device to move batches to
- ood_names: List[str]
OOD Dataset names
OpenOOD Benchmark
- class pytorch_ood.benchmark.CIFAR100_OpenOOD(root, transform)[source]
Aims to replicate the benchmark proposed in OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.
- See Paper:
Outlier datasets are
CIFAR10
TinyImageNet
MNIST
FashionMNIST
Textures
Places365
Warning
This currently does not reproduce the benchmark accurately, as it does not exclude images with overlap with CIFAR100.
- Parameters:
root – where to store datasets
transform – transform to apply to images
- evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
- Parameters:
detector – the detector to evaluate
loader_kwargs – keyword arguments to give to the data loader
device – the device to move batches to
- ood_names: List[str]
OOD Dataset names
ImageNet
OpenOOD Benchmark
- class pytorch_ood.benchmark.ImageNet_OpenOOD(root, image_net_root, transform)[source]
Replicates the ImageNet benchmark proposed in OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection.
- See Paper:
Near-OOD datasets:
SSB-Hard
NINCO
Far-OOD datasets:
iNaturalist
Textures
OpenImage-O
- Parameters:
root – where to store datasets
image_net_root – root for the ImageNet dataset
transform – transform to apply to images
- evaluate(detector: Detector, loader_kwargs: Dict | None = None, device: str = 'cpu') List[Dict][source]
Evaluates the given detector on all datasets and returns a list with the results
- Parameters:
detector – the detector to evaluate
loader_kwargs – keyword arguments to give to the data loader
device – the device to move batches to
- ood_names: List[str]
OOD Dataset names