Approximating Memorization Using Loss Surface Geometry for Dataset Pruning and Summarization

   page       attach   

The sustainable training of modern neural network models represents an open challenge. Several existing methods approach this issue by identifying a subset of relevant data samples from the full training data to be used in model optimization with the goal of matching the performance of the full data training with that of the subset data training. Our work explores using memorization scores to find representative and atypical samples. We demonstrate that memorization-aware dataset summarization improves the subset construction performance. However, computing memorization scores is notably resource-intensive. To this end, we propose a novel method that leverages the discrepancy between sharpness-aware minimization and stochastic gradient descent to capture data points atypicality. We evaluate our metric over several efficient approximation functions for memorization scores - namely proxies -, empirically showing superior correlation and effectiveness. We explore the causes behind our approximation quality, highlighting how typical data points trigger a flatter loss landscape compared to atypical ones. Extensive experiments confirm the effectiveness of our proxy for dataset pruning and summarization tasks, surpassing state-of-the-art approaches both on canonical setups - where atypical data points benefit performance - and few-shot learning scenarios-where atypical data points can be detrimental.

hosting event
reference publication
page_white_acrobatApproximating Memorization Using Loss Surface Geometry for Dataset Pruning and Summarization (paper in proceedings, 2024) — Andrea Agiollo, Young In Kim, Rajiv Khanna
funding project
wrenchENGINES — ENGineering INtElligent Systems around intelligent agent technologies (28/09/2023–27/09/2025)
works as
reference talk for
page_white_acrobatApproximating Memorization Using Loss Surface Geometry for Dataset Pruning and Summarization (paper in proceedings, 2024) — Andrea Agiollo, Young In Kim, Rajiv Khanna

cover