Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep:67:361-372.
doi: 10.1016/j.knosys.2014.04.019.

A Probabilistic Approach to Mitigate Composition Attacks on Privacy in Non-Coordinated Environments

Affiliations

A Probabilistic Approach to Mitigate Composition Attacks on Privacy in Non-Coordinated Environments

A H M Sarowar Sattar et al. Knowl Based Syst. 2014 Sep.

Abstract

Organizations share data about individuals to drive business and comply with law and regulation. However, an adversary may expose confidential information by tracking an individual across disparate data publications using quasi-identifying attributes (e.g., age, geocode and sex) associated with the records. Various studies have shown that well-established privacy protection models (e.g., k-anonymity and its extensions) fail to protect an individual's privacy against this "composition attack". This type of attack can be thwarted when organizations coordinate prior to data publication, but such a practice is not always feasible. In this paper, we introduce a probabilistic model called (d, α)-linkable, which mitigates composition attack without coordination. The model ensures that d confidential values are associated with a quasi-identifying group with a likelihood of α. We realize this model through an efficient extension to k-anonymization and use extensive experiments to show our strategy significantly reduces the likelihood of a successful composition attack and can preserve more utility than alternative privacy models, such as differential privacy.

Keywords: Anonymization; Composition attack; Data publication; Databases; Privacy.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The average accuracy of the composition attack on 10-anonymized versions of the Salary and Occupation data sets.
Figure 2
Figure 2
The average accuracy of the composition attack on k-anonymized versions of the 101K Salary and Occupation data sets.
Figure 3
Figure 3
The average classification accuracy of the 10-anonymized Salary data sets as a function of their sizes.
Figure 4
Figure 4
The average classification accuracy of the 101K Salary set as a function of the k-anonymization level.
Figure 5
Figure 5
The average query errors of the 10-anonymized versions of the Salary and Occupation data sets as a function of their sizes.
Figure 6
Figure 6
The average query errors of the 101K Salary and Occupation data sets as a function of the k-anonymization level.
Figure 7
Figure 7
Unchanged responses through differentially private mechani
Figure 8
Figure 8
The average query errors of the DP [22] and 10-anonymized with dLink versions of the Salary and Occupation data sets.
Figure 9
Figure 9
Distance between the original data set, the output of dLink, and several privacy budgets of differential privacy (ε = 0.01, 0.05, 0.1)
Figure 10
Figure 10
The average accuracy of the composition attack on 10-anonymized versions with dLink of different sizes Salary data sets as functions of the parameters d and α
Figure 11
Figure 11
The average classification accuracy of the 10-anonymized versions with dLink of different sizes Salary data sets as functions of the parameters d and α with classifier J48
Figure 12
Figure 12
The average query errors of the 10-anonymized versions with dLink of different sizes Salary data sets as functions of the parameters d and α.
Figure 13
Figure 13
The execution time of the 10-anonymized versions of the Salary and Occupation data sets as a function of their sizes.
Figure 14
Figure 14
Execution time of dLink along with the execution time of 10-anonymized version of the 101K Salary data sets as a function of d.

References

    1. Aggarwal Charu C, Yu Philip S. A condensation approach to privacy preserving data mining. Proceedings of the 9th International Conference on Extending Database Technology; Heraklion, Crete; Greece. 2004. pp. 183–199.
    1. Cebul Randall D, Rebitzer James B, Taylor Lowell J, Votruba Mark. Organizational fragmentation and care quality in the U.S. health care system. Working Paper 14212, National Bureau of Economic Research. 2008 Aug - PubMed
    1. Chow Richard, Golle Philippe, Staddon Jessica. Detecting privacy leaks using corpus-based association rules. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Las Vegas; Nevada, U.S.. 2008. pp. 893–901.
    1. Domingo-Ferrer Josep, Torra Vicenҫ. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery. 2005;11(2):195–212.
    1. Domingo-Ferrer Josep, González-Nicolás Úrsula. Hybrid microdata using microaggregation. Information Sciences. 2010;180(15):2834–2844.