Fairness in diabetes prediction dataset

Arianna Albertazzi  •  Lucia Gasperini
abstract

Fairness in machine learning (ML) [8][5] is crucial, especially as AI systems increasingly influence various sectors, including healthcare and legal decisions. The application of AI must be cautious, as issues of fairness and algorithmic bias can have severe consequences. This work aims to try different methods to assess the fairness of machine learning models, regarding the risk of rehospital- ization in the case of diabetic patients. We use a kaggle dataset [3] representing 10 years (1999-2008) of clinical care at 130 US hospitals and integrated de- livery networks. It includes over 50 features representing patient and hospital outcomes. The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result [2], diagnosis, number of med- ication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc..

outcomes