Image Classification

The objective of this section is to outline a quick method for obtaining baseline results for comparison purposes.

To run these examples, you have to install pandas as well as scikit-learn as additional dependencies:

pip install pandas scikit-learn

Benchmark

Ready-to-use benchmarks provide a simple interface to (approximately) replicate the experiments of other publications. While they are convenient, this comes at the price of less flexibility.

Manual

Code for manually running benchmarks. More boilerplate, but also more flexibility compared to the benchmark interface.

We provide an example that replicates a commonly used benchmark that includes 12 Out-of-Distribution detectors, each tested against 9 OOD datasets. We subsequently calculate the average performance of each detector across all datasets and sort the outcomes based on their Area Under Receiver Operating Characteristic (AUROC) score in ascending order.

CIFAR 10

CIFAR 100

Gallery generated by Sphinx-Gallery