The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

   page       BibTeX_logo.png   
Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini
Davide Calvaresi, Amro Najjar, Andrea Omicini, Reyhan Aydoǧan, Rachele Carli, Giovanni Ciatto, Yazan Mualla, Kary Främling (eds.)
Explainable and Transparent AI and Multi-Agent Systems, chapter 6, pages 97–115
Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence) 14127
Springer
September 2023

Although popular and effective, large language models (LLM) are characterised by a performance vs. transparency trade-off that hinders their applicability to sensitive scenarios. To address this issue, several approaches have been recently proposed by the XAI research community, mostly focusing on local post-hoc explanations. However, to the best of our knowledge, a thorough comparison among available explainability techniques is currently missing, mainly for the lack of a general metric to measure their benefits. We compare state-of-the-art local post-hoc explanation mechanisms for models trained over moral value classification tasks based on a measure of correlation. By relying on a novel framework for comparing global impact scores, our experiments show how most local post-hoc explainers are loosely correlated, and highlight huge discrepancies in their results—their ``quarrel'' about explanations. Finally, we compare the impact scores distribution obtained from each local post-hoc explainer with human-made dictionaries, showing alarmingly how there exists no correlation between explanation outputs and the concepts considered to be salient by humans.

keywordsNatural Language Processing . Moral Values Classification • eXplainable Artificial Intelligence • Local Post-hoc Explanations.
reference talk
page_white_powerpointThe Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing (EXTRAAMAS 2023@AAMAS 2023, 29/05/2023) — Andrea Agiollo (Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini)
origin event
worldEXTRAAMAS 2023@AAMAS 2023
journal or series
book Lecture Notes in Computer Science (LNCS)
container publication
page_white_acrobatExplainable and Transparent AI and Multi-Agent Systems (edited volume, 2023) — Davide Calvaresi, Amro Najjar, Andrea Omicini, Reyhan Aydoǧan, Rachele Carli, Giovanni Ciatto, Yazan Mualla, Kary Främling
funding project
wrenchEXPECTATION — Personalized Explainable Artificial Intelligence for decentralized agents with heterogeneous knowledge (01/04/2021–31/03/2024)
works as
reference publication for talk
page_white_powerpointThe Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing (EXTRAAMAS 2023@AAMAS 2023, 29/05/2023) — Andrea Agiollo (Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini)