Image Classification
The objective of this section is to outline a quick method for obtaining baseline results for comparison purposes.
To run these examples, you have to install pandas as well as scikit-learn
as additional dependencies:
pip install pandas scikit-learn
Benchmark
Ready-to-use benchmarks provide a simple interface to (approximately) replicate the experiments of other publications. While they are convenient, this comes at the price of less flexibility.
Manual
Code for manually running benchmarks. More boilerplate, but also more flexibility compared to the benchmark interface.
We provide an example that replicates a commonly used benchmark that includes 12 Out-of-Distribution detectors, each tested against 9 OOD datasets. We subsequently calculate the average performance of each detector across all datasets and sort the outcomes based on their Area Under Receiver Operating Characteristic (AUROC) score in ascending order.