Comparing Different Models’ Reliance on Prohibited Features in the Adult Census Income

Daniele Baiocco

abstract

The first goal of this project is to determine if a model's inherent feature importance serves as a proxy for its interpretability when assessed through Model-Agnostic Explanation methods. The second aim is to investigate whether low feature importance scores for prohibited features, obtained through model-agnostic explanation methods, correspond to a low Disparate Treatment Index. If such a correspondence exists, it would suggest that models assigning low importance to these features are less likely to be discriminatory. Lastly, the project applies a Lagrangian approach to a neural network to minimize the Disparate Treatment Index, comparing feature importances before and after mitigation. This comparison highlights which features have decreased in importance, revealing those that were originally correlated with prohibited features.

keywords

Interpretability, Feature Importance, Mitigation Technique

references

1] Sina Aghaei, Mohammad Javad Azizi, and Phebe Vayanos. Learning
optimal and fair decision trees for non-discriminative decision-making.
Center for Artificial Intelligence in Society, University of Southern Cal-
ifornia, 2024.
[2] Andr´e Altmann, Laura Tolo¸si, Oliver Sander, and Thomas Lengauer. Per-
mutation importance: a corrected feature importance measure. Bioinfor-
matics, 26(10):1340–1347, 04 2010.
[3] Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repos-
itory, 1996. DOI: https://doi.org/10.24432/C5XW20.
[4] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should I
trust you?”: Explaining the predictions of any classifier. pages 1135–1144,
2016

outcomes

forum on Virtuale • repo url for the project • final report • external url of the final report • slides for the project discussion • video for the project discussion