A Comparative Study of Pre-defined and Automatically Discovered Concepts for Interpretability

Valerio Costa • Luca Domeniconi

abstract

The demand for interpretability in Artificial Intelligence (AI) is growing, particularly in high-stakes
domains where understanding model decisions is paramount for trust and accountability. This project
proposes a comparative study of two distinct approaches to achieving interpretability in computer vi-
sion models: Concept Bottleneck Models[Koh+20](CBMs), which leverage human-annotated, pre-defined
concepts, and Concept Recursive Activation FacTorization (CRAFT)[Fel+23], which automatically dis-
covers concepts from trained neural networks. We aim to explore the strengths and weaknesses of each
paradigm in terms of their ability to provide understandable and actionable explanations, and to inves-
tigate the semantic alignment between human-engineered concepts and automatically learned concepts.
The project will involve implementing and evaluating both methods on a relevant dataset, culminating
in a detailed analysis of their interpretability characteristics and ethical implications.

outcomes

forum on Virtuale • repo url for the project • final report • slides for the project discussion • video for the project discussion