The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

Although popular and effective, large language models (LLM) are characterised by a performance vs. transparency trade-off that hinders their applicability to sensitive scenarios. This is the main reason behind many approaches focusing on local post-hoc explanations recently proposed by the XAI community. However, to the best of our knowledge, a thorough comparison among available explainability techniques is currently missing, mainly for the lack of a general metric to measure their benefits. We compare state-of-the-art local post-hoc explanation mechanisms for models trained over moral value classification tasks based on a measure of correlation. By relying on a novel framework for comparing global impact scores, our experiments show how most local post-hoc explainers are loosely correlated, and highlight huge discrepancies in their results, their ``quarrel'' about explanations. Finally, we compare the impact scores distribution obtained from each local post-hoc explainer with human-made dictionaries, and point out that there is no correlation between explanation outputs and the concepts humans consider as salient.

evento contenitore

EXTRAAMAS 2023@AAMAS 2023

pubblicazione di riferimento

The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing (articolo in atti, 2023) — Andrea Agiollo, Luciano C. Siebert, Pradeep K. Murukannaiah, Andrea Omicini

progetto finanziatore

EXPECTATION — Personalized Explainable Artificial Intelligence for decentralized agents with heterogeneous knowledge (01/04/2021–31/03/2024)

funge da

presentazione di riferimento per