Subscribe via Email

March 21, 2024 | 1 minutes to read

New Paper - Joint Selection

I have a new paper out with my colleagues from UMass Amherst: Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data. Many data sources that researchers and policy makers are interested in are updated through periodic releases ranging from large-scale surveys such as the Current Population Survey (CPS) to governmental administrative records. Since these datasets often contain sensitive information, it may be the case that only aggregate statistics are released or, alternatively, a synthetic dataset is constructed and released (either case hopefully under differential privacy). We introduce a new method JAM-PGM to utilize public data to improve the quality of synthetic data generated under differential privacy. In the case of periodically released datasets, public data could include prior releases. In the paper, we look at cases where the public and private data do not follow the same distribution, which is what one would expect if using such techniques in the wild.

Abstract:

Mechanisms for generating differentially private synthetic data based on marginals and graphical models have been successful in a wide range of settings. However, one limitation of these methods is their inability to incorporate public data. Initializing a data generating model by pre-training on public data has shown to improve the quality of synthetic data, but this technique is not applicable when model structure is not determined a priori. We develop the mechanism JAM-PGM, which expands the adaptive measurements framework to jointly select between measuring public data and private data. This technique allows for public data to be included in a graphical-model-based mechanism. We show that JAM-PGM is able to outperform both publicly assisted and non publicly assisted synthetic data generation mechanisms even when the public data distribution is biased.

The preprint is available here. This paper will appear at AISTATS 2024.

The Seine (1902) by Henry Ossawa Tanner
Topics: DifferentialPrivacy   Topics: NewPaper
Written on March 21, 2024 Buy me a coffeeBuy me a coffee