An Abductive Platform for Salient Neurons Identification

Christian D'Errico

In the broad spectrum of artificial intelligence systems, neural networks stand out as models with significant predictive capabilities. However, they simultaneously exhibit a low degree of interpretability. Their use in decision support processes has prompted researchers and policymakers worldwide to introduce regulations aimed at ensuring transparency and adherence to ethical principles in the design of these systems: this has led to the development of various approaches aimed at understanding their functioning.

This work focuses on constructing a platform that leverages the potential offered by abductive reasoning to identify the salient neurons of a neural network. The goal is to identify those neurons that activate significantly during the execution of specific tasks to better understand the features detected by the neural network and identify any anomalies in learning. Additionally, identifying crucial neurons can enable model optimization, while pinpointing less influential ones can allow for the simplification of its structure.

Building upon previous research by Alexey Ignatiev and his team, this work extends the approach proposed in the paper "Abduction-Based Explanations for Machine Learning Models", a technique that, through mixed-integer linear programming (MILP), obtains a coding of the neural model within a set of decision problem constraints. A solver (oracle), capable of responding to entailment queries, provides minimal explanations for the predictions made by the target model. Specifically, this study involves the analysis of convolutional models for image classification, whose complexity overcomes that of the models in the original work, rather simplified, single-layer feedforward networks with a modest number of nodes.
The results highlight the applicability of abductive reasoning for identifying salient neurons and demonstrate how the proposed approach offers new perspectives for understanding complex models and improving their interpretability.