Formal Safety Assessment of Neural Networks under Adversarial Attacks

Mauro Dore  •  Gian Mario Marongiu  •  Riccardo Murgia
abstract

Providing formal guarantees on the behavior of machine learning models remains a central chal-
lenge for their deployment in safety-critical domains. The SMLE framework, recently proposed by
Matteo Francobaldi and Michele Lombardi, introduces an architecture and training procedure that
ensures compliance with designer-specified input-output properties through conservative overap-
proximation and counterexample-guided training [FL24].
In this project, under the supervision of Prof. Michele Lombardi, we aim to extend SMLE’s
contributions by exploring the framework’s robustness when implemented without special archi-
tectural accommodations (e.g., regularization). Our methodology will involve training standard
SMLE architectures and systematically searching for counterexamples by implementing exact ver-
ification methods, in contrast to the looser verification methods described in [FL24]. We will focus
on binary classification tasks and quantify the occurrence of false positives (i.e., instances where a
”false” counterexample is detected) across different configurations. This comprehensive evaluation
will help assess the effectiveness of SMLE’s verification mechanisms and identify patterns that
may guide the design of safer neural models with stronger formal guarantees. Additionally, we will
study the system’s performance as parameters and hyperparameters are varied.

outcomes