Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

Thom Volker†, Gerko Vink†

Tuesday, November 23, 2021

PDF Code Project Source Document DOI

Image credit: Jayden Walters

Abstract

Synthetic datasets simultaneously allow for the dissemination of research data while protecting the privacy and confidentiality of respondents. Generating and analyzing synthetic datasets is straightforward, yet, a synthetic data analysis pipeline is seldom adopted by applied researchers. We outline a simple procedure for generating and analyzing synthetic datasets with the multiple imputation software mice (Version 3.13.15) in R. We demonstrate through simulations that the analysis results obtained on synthetic data yield unbiased and valid inferences and lead to synthetic records that cannot be distinguished from the true data records. The ease of use when synthesizing data with mice along with the validity of inferences obtained through this procedure opens up a wealth of possibilities for data dissemination and further research on initially private data.

Type

Journal article

Publication

Psych, 4(3)

All accompanying files and R-code can be found on the project’s GitHub page.

Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

Abstract

Thom Volker

Statistician • Data Scientist • Sociologist

PhD Candidate in Methods and Statistics

Related

Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

Abstract

Thom Volker

Statistician • Data Scientist • Sociologist PhD Candidate in Methods and Statistics

Related

Statistician • Data Scientist • Sociologist

PhD Candidate in Methods and Statistics