Image Classification
The objective of this section is to outline a quick method for obtaining baseline results for comparison purposes.
To run these examples, you have to install pandas as well as scikit-learn
as additional dependencies:
pip install pandas scikit-learn
Benchmark Interface
Ready-to-use benchmarks provide a simple interface to (approximately) replicate the experiments of other publications. While they are convenient, this comes at the price of flexibility.
From Scratch
Code for manually running benchmarks. More boilerplate, but also more flexibility compared to the benchmark interface.
We provide an example that replicates a commonly used benchmark that includes several Out-of-Distribution detectors, each tested against several OOD datasets. We subsequently calculate the average performance of each detector across all datasets and sort the outcomes based on their Area Under Receiver Operating Characteristic (AUROC) score in ascending order.