Exploring Natural Example-Based Local and Global Explainability Across Different Data Types

Claudia Maiolino • Tian Cheng Xia

abstract

Natural example-based explainability methods aim at finding representative samples drawn from the
training set of the model. The rationale behind these methods is that human intuition and reasoning
is heavily based on concepts built upon examples, making these approaches an ideal candidate for
enhancing explainability. In this project, we experiment with this class of methods and apply them
on image, text, and tabular data in a post-hoc data-agnostic classification setup. We experiment
with different models and search hyperparameters to analyze the outcomes in different embedding
spaces. From the experimental results, we found out that examples provide insights on how the
model behaves. In particular, our main findings are that: examples can provide information on
how the model is classifying a sample by giving an idea of the form of the embedding space and
the decision boundaries; each class has preferred recurrent axes in the embeddings which allow
identifying patterns and biases in the dataset; and examples provide information that can be useful to
analyze misclassified samples and to detect simple adversarial attacks.

outcomes

forum on Virtuale • repo url for the project • final report • slides for the project discussion • video for the project discussion