Evaluation of Privacy Preservation Capabilities through Synthetic Data on Multiple Datasets

Mattia Buzzoni • Riccardo Romeo

abstract

In the current landscape, characterized by the emergence of large-
scale datasets, privacy preservation has become a central issue. The need
for data to train recommendation algorithms has led to the collection of
often sensitive and private information. A possible solution to ensure user
privacy is the use of synthetic data, artificially generated, as a substitute
for real data.
The main challenge of this approach lies in balancing the quality and
representativeness of synthetic data with the ability to preserve privacy,
preventing the possibility of tracing back to private information. High
quality synthetic data does not always translate into good generalization
and privacy protection, making it necessary to identify an effective trade-
off.
This project aims to evaluate different synthetic data generation tech-
niques by testing their performance in terms of privacy preservation and
data quality across multiple real-world datasets. Models based on var-
ious approaches will be analyzed in order to identify the most effective
methodologies.

outcomes

forum on Virtuale • repo url for the project