Comparing Different Models’ Reliance on Prohibited Features in the Adult Census Income

   page       attach   
Daniele Baiocco
abstract

The first goal of this project  is to determine if a model's inherent feature importance serves as a proxy for its interpretability when assessed through Model-Agnostic Explanation methods. The second aim is to investigate whether low feature importance scores for prohibited features, obtained through model-agnostic explanation methods, correspond to a low Disparate Treatment Index. If such a correspondence exists, it would suggest that models assigning low importance to these features are less likely to be discriminatory. Lastly, the project applies a Lagrangian approach to a neural network to minimize the Disparate Treatment Index, comparing feature importances before and after mitigation. This comparison highlights which features have decreased in importance, revealing those that were originally correlated with prohibited features.

keywords
Interpretability, Feature Importance, Mitigation Technique
references

[1] Sina Aghaei, Mohammad Javad Azizi, and Phebe Vayanos. Learning optimal
and fair decision trees for non-discriminative decision-making. Center for
Artificial Intelligence in Society, University of Southern California, 2024.
[2] Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repository,
1996. DOI: https://doi.org/10.24432/C5XW20.

outcomes