Mattia Gualandi
• Presepi Alex
• Sciarrillo Alessandro
abstract
The growing integration of deep learning models in critical domains has brought urgent attention to their vulnerabilities. One of the most concerning threats is adversarial attacks on image-based AI systems: subtle perturbations to input images that can mislead even the most accurate models into making incorrect predictions, often without detection.
These attacks challenge not only the technical robustness of AI models but also raise significant ethical concerns, particularly when human safety, fairness, and accountability are at stake.
In this project, we explore the nature of adversarial attacks on image data, aiming to deepen our understanding of their mechanisms and implications. We review and categorize popular attack techniques, evaluate their impact on standard vision models, and assess the effectiveness of different models we designed and trained to counteract the considered attacks.
Furthermore, we developed and tested different versions of a module capable of recognizing when an input image has been corrupted. Finally, the analysis is extended to practical case studies in face recognition and vehicle detection, where we investigate how adversarial attacks can compromise security and fairness in real-world applications.
By framing the problem from both a technical and ethical perspective, we highlight the importance of designing AI systems that are not only performant but also resilient and trustworthy under adversarial conditions.
outcomes