Andrea Agiollo, Young In Kim, Rajiv Khanna
The sustainable training of modern neural network models represents an open challenge. Several existing methods approach this issue by identifying a subset of relevant data samples from the full training data to be used in model optimization with the goal of matching the performance of the full data training with that of the subset data training. Our work explores using memorization scores to find representative and atypical samples. We demonstrate that memorization-aware dataset summarization improves the subset construction performance. However, computing memorization scores is notably resource-intensive. To this end, we propose a novel method that leverages the discrepancy between sharpness-aware minimization and stochastic gradient descent to capture data points atypicality. We evaluate our metric over several efficient approximation functions for memorization scores – namely proxies –, empirically showing superior correlation and effectiveness. We explore the causes behind our approximation quality, highlighting how typical data points trigger a flatter loss landscape compared to atypical ones. Extensive experiments confirm the effectiveness of our proxy for dataset pruning and summarization tasks, surpassing state-of-the-art approaches both on canonical setups – where atypical data points benefit performance – and few-shot learning scenarios—where atypical data points can be detrimental.
keywords
Neural Networks, Data-efficient Learning, Memorization, Flatness
reference talk
origin event
funding project
ENGINES — ENGineering INtElligent Systems around intelligent agent technologies
(28/09/2023–27/09/2025)
works as
reference publication for talk