The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini

Davide Calvaresi, Amro Najjar, Andrea Omicini, Reyhan Aydoǧan, Rachele Carli, Giovanni Ciatto, Yazan Mualla, Kary Främling (eds.)

Explainable and Transparent AI and Multi-Agent Systems, chapter 6, pages 97–115

Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence) 14127

Springer

September 2023

Although popular and effective, large language models (LLM) are characterised by a performance vs. transparency trade-off that hinders their applicability to sensitive scenarios. To address this issue, several approaches have been recently proposed by the XAI research community, mostly focusing on local post-hoc explanations. However, to the best of our knowledge, a thorough comparison among available explainability techniques is currently missing, mainly for the lack of a general metric to measure their benefits. We compare state-of-the-art local post-hoc explanation mechanisms for models trained over moral value classification tasks based on a measure of correlation. By relying on a novel framework for comparing global impact scores, our experiments show how most local post-hoc explainers are loosely correlated, and highlight huge discrepancies in their results—their ``quarrel'' about explanations. Finally, we compare the impact scores distribution obtained from each local post-hoc explainer with human-made dictionaries, showing alarmingly how there exists no correlation between explanation outputs and the concepts considered to be salient by humans.

keywords Natural Language Processing . Moral Values Classification • eXplainable Artificial Intelligence • Local Post-hoc Explanations.

reference talk

The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing (EXTRAAMAS 2023@AAMAS 2023, 29/05/2023) — Andrea Agiollo (Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini)

origin event

EXTRAAMAS 2023@AAMAS 2023

journal or series

Lecture Notes in Computer Science (LNCS)

container publication

Explainable and Transparent AI and Multi-Agent Systems (edited volume, 2023) — Davide Calvaresi, Amro Najjar, Andrea Omicini, Reyhan Aydoǧan, Rachele Carli, Giovanni Ciatto, Yazan Mualla, Kary Främling

funding project

EXPECTATION — Personalized Explainable Artificial Intelligence for decentralized agents with heterogeneous knowledge (01/04/2021–31/03/2024)

works as

reference publication for talk