| A | B | C | |
|---|---|---|---|
| Male | 4 | 19 | 3 |
| Female | 12 | 0 | 1 |
Thom Benjamin Volker

Research interests: methods to enhance data privacy, synthetic data and multiple imputation of missing data.
But are the risks large enough to justify our data protection efforts?
Anonymized data can be identifying!
Published tables (e.g., census counts by age, sex, ethnicity) may seem harmless
Reconstruction attacks can use these tables to infer plausible individual-level data (Dick et al., 2022).
But: it is not clear how severe this disclosure is.
| A | B | C | |
|---|---|---|---|
| Male | 4 | 19 | 3 |
| Female | 12 | 0 | 1 |
| A | B | C | |
|---|---|---|---|
| Male | 4 | 19 | 3 |
| Female | 12 | NA | NA |
\(n_F = 13 \implies n_{F,B} + n_{F,C} = 1\)
Suppose disease \(B\) is prostate cancer \(\implies n_{F,C} = 1\)
“Optimal” cell suppression can be performed with \(\tau\)-ARGUS
| A | B | C | |
|---|---|---|---|
| Male | 4 | 19 | 3 |
| Female | 12 | 0 | 1 |
| A | B | C | |
|---|---|---|---|
| Male | 5 | 20 | 5 |
| Female | 10 | 0 | 0 |
Adjust all cells in table to a specified base.
Coarsens information
Typically not recommended: adjustments might provide better disclosure protection
| A | B | C | |
|---|---|---|---|
| Male | 4 | 19 | 3 |
| Female | 12 | 0 | 1 |
| A | B | C | |
|---|---|---|---|
| Male | 3 | 19 | 2 |
| Female | 12 | 1 | 2 |
Some random perturbation added to every cell in a table
Additivity gets broken
Also implemented in \(\tau\)-ARGUS
Ex ante guarantee: whatever can be learned about any individual in this data is bounded by \(\epsilon\)
Neighboring datasets: \(X\) and \(X'\) differ in a single row.
Sensitivity: \(\Delta_f = \max_{X, X'} |f(X) - f(X')|\)
Differential privacy: for all possible outputs \(a\)
\(P[\tilde f(X) = a] \leq \exp\{\epsilon\} P[\tilde f(X') = a]\)
Counting the number of males

Questions/remarks?
