Comparing Different Models’ Reliance on Prohibited Features in the Adult Census Income

   page       attach   
Daniele Baiocco
abstract

The first goal of this project  is to determine if a model's inherent feature importance serves as a proxy for its interpretability when assessed through Model-Agnostic Explanation methods. The second aim is to investigate whether low feature importance scores for prohibited features, obtained through model-agnostic explanation methods, correspond to a low Disparate Treatment Index. If such a correspondence exists, it would suggest that models assigning low importance to these features are less likely to be discriminatory. Lastly, the project applies a Lagrangian approach  to a neural network to minimize the Disparate Treatment Index, comparing feature importances before and after mitigation. This comparison highlights which features have decreased in importance, revealing those that were originally correlated with prohibited features.

keywords
Interpretability, Feature Importance, Mitigation Technique
references

1] Sina Aghaei, Mohammad Javad Azizi, and Phebe Vayanos. Learning
optimal and fair decision trees for non-discriminative decision-making.
Center for Artificial Intelligence in Society, University of Southern Cal-
ifornia, 2024.
[2] Andr´e Altmann, Laura Tolo¸si, Oliver Sander, and Thomas Lengauer. Per-
mutation importance: a corrected feature importance measure. Bioinfor-
matics, 26(10):1340–1347, 04 2010.
[3] Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repos-
itory, 1996. DOI: https://doi.org/10.24432/C5XW20.
[4] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should I
trust you?”: Explaining the predictions of any classifier. pages 1135–1144,
2016

outcomes