Statistical disclosure control for medical output

Utrecht University & Statistics Netherlands

Data privacy matters

Thom Benjamin Volker

Research interests: methods to enhance data privacy, synthetic data and multiple imputation of missing data.

But are the risks large enough to justify our data protection efforts?

Sweeney (1997): Linked anonymized medical discharge data with voter registration
- Anonymized \(\neq\) unidentifiable
Narayanan & Shmatikov (2007): How to break anonymity of the netflix prize dataset
- Linking ratings and timestamps to IMDB

Anonymized data can be identifying!

Published tables (e.g., census counts by age, sex, ethnicity) may seem harmless
Reconstruction attacks can use these tables to infer plausible individual-level data (Dick et al., 2022).
- Leading to reconstruction of individual records

But: it is not clear how severe this disclosure is.

Identity disclosure
- Based on published statistics, we can identify individuals in the data
Attribute disclosure
- Without per se identifying individuals, we can learn something about them from the data that we could not learn without the data

	A	B	C
Male	4	19	3
Female	12	0	1

	A	B	C
Male	4	19	3
Female	12	NA	NA

\(n_F = 13 \implies n_{F,B} + n_{F,C} = 1\)

Suppose disease \(B\) is prostate cancer \(\implies n_{F,C} = 1\)

“Optimal” cell suppression can be performed with \(\tau\)-ARGUS

	A	B	C
Male	4	19	3
Female	12	0	1

	A	B	C
Male	5	20	5
Female	10	0	0

Adjust all cells in table to a specified base.
Coarsens information
Typically not recommended: adjustments might provide better disclosure protection

	A	B	C
Male	4	19	3
Female	12	0	1

	A	B	C
Male	3	19	2
Female	12	1	2

Ex ante guarantee: whatever can be learned about any individual in this data is bounded by \(\epsilon\)

Differential privacy: for all possible outputs \(a\)

\(P[\tilde f(X) = a] \leq \exp\{\epsilon\} P[\tilde f(X') = a]\)

Counting the number of males

One additional observation: at most one additional male
- Sensitivity: \(\Delta_f = 1\)
- \(\epsilon = 1\) (user-defined)

Questions/remarks?