New Paper - AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
I have a new paper out on arXiv with my colleagues at UMass Amherst (Ryan McKenna, Daniel Sheldon, and Gerome Miklau). This paper proposes a new method for generating differentially private synthetic data that’s tailored to one’s use case by only capturing information from the desired marginal distributions. This approach builds on prior work on generating private synthetic data by representing a data distribution as a probabilistic graphical model. See this post for a summary of current techniques.
Abstract:
We propose AIM, a novel algorithm for differentially private synthetic data generation. AIM is a workload-adaptive algorithm, within the paradigm of algorithms that first selects a set of queries, then privately measures those queries, and finally generates synthetic data from the noisy measurements. It uses a set of innovative features to iteratively select the most useful measurements, reflecting both their relevance to the workload and their value in approximating the input data. We also provide analytic expressions to bound per-query error with high probability, which can be used to construct confidence intervals and inform users about the accuracy of generated data. We show empirically that AIM consistently outperforms a wide variety of existing mechanisms across a variety of experimental settings.
The preprint is available here.