Robustness in Counterfactual Explanations Generation using LLMs

Chang Sun • Xiaofeng Zhang

abstract

In explainable artificial intelligence (XAI), counterfactual explanations
(CEs) act as an efficient post-hoc method to provide actionable insights
by identifying the minimal changes to input features that would alter
a model’s prediction, thereby illuminating its decision boundaries and
enhancing transparency. Large language models (LLMs) emerges as a
powerful tool for CEs generation due to its strong capability and good
common sense. In standard CEs generation process we need an Oracle
to evaluate the CEs generated by LLMs. But the Oracle itself can be
inaccurate and brings uncertainty. Hence in this work we aim to design
various metrics to evaluate CEs generated by different LLMs given the
truth that the Oracle is not reliable all the time.

outcomes

forum on Virtuale • repo url for the project • final report • slides for the project discussion • video for the project discussion