Evaluating predictors of geographic area population size cut-offs to manage re-identification risk
- PMID: 19074299
- PMCID: PMC2649314
- DOI: 10.1197/jamia.M2902
Evaluating predictors of geographic area population size cut-offs to manage re-identification risk
Abstract
Objective: In public health and health services research, the inclusion of geographic information in data sets is critical. Because of concerns over the re-identification of patients, data from small geographic areas are either suppressed or the geographic areas are aggregated into larger ones. Our objective is to estimate the population size cut-off at which a geographic area is sufficiently large so that no data suppression or further aggregation is necessary.
Design: The 2001 Canadian census data were used to conduct a simulation to model the relationship between geographic area population size and uniqueness for some common demographic variables. Cut-offs were computed for geographic area population size, and prediction models were developed to estimate the appropriate cut-offs.
Measurements: Re-identification risk was measured using uniqueness. Geographic area population size cut-offs were estimated using the maximum number of possible values in the data set and a traditional entropy measure.
Results: The model that predicted population cut-offs using the maximum number of possible values in the data set had R2 values around 0.9, and relative error of prediction less than 0.02 across all regions of Canada. The models were then applied to assess the appropriate geographic area size for the prescription records provided by retail and hospital pharmacies to commercial research and analysis firms.
Conclusions: To manage re-identification risk, the prediction models can be used by public health professionals, health researchers, and research ethics boards to decide when the geographic area population size is sufficiently large.
Figures


Similar articles
-
A method for managing re-identification risk from small geographic areas in Canada.BMC Med Inform Decis Mak. 2010 Apr 2;10:18. doi: 10.1186/1472-6947-10-18. BMC Med Inform Decis Mak. 2010. PMID: 20361870 Free PMC article.
-
Evaluating common de-identification heuristics for personal health information.J Med Internet Res. 2006 Nov 21;8(4):e28. doi: 10.2196/jmir.8.4.e28. J Med Internet Res. 2006. PMID: 17213047 Free PMC article.
-
Estimating the re-identification risk of clinical data sets.BMC Med Inform Decis Mak. 2012 Jul 9;12:66. doi: 10.1186/1472-6947-12-66. BMC Med Inform Decis Mak. 2012. PMID: 22776564 Free PMC article.
-
Can census offices publish statistics for more than one small area geography? An analysis of the differencing problem in statistical disclosure.Int J Geogr Inf Sci. 1998 Sep;12(6):579-605. doi: 10.1080/136588198241680. Int J Geogr Inf Sci. 1998. PMID: 12294183
-
Geographic modeling and simulation systems for geographic research in the new era: Some thoughts on their development and construction.Sci China Earth Sci. 2021;64(8):1207-1223. doi: 10.1007/s11430-020-9759-0. Epub 2021 Jun 29. Sci China Earth Sci. 2021. PMID: 34249112 Free PMC article. Review.
Cited by
-
A method for managing re-identification risk from small geographic areas in Canada.BMC Med Inform Decis Mak. 2010 Apr 2;10:18. doi: 10.1186/1472-6947-10-18. BMC Med Inform Decis Mak. 2010. PMID: 20361870 Free PMC article.
-
The Data Tags Suite (DATS) model for discovering data access and use requirements.Gigascience. 2020 Feb 1;9(2):giz165. doi: 10.1093/gigascience/giz165. Gigascience. 2020. PMID: 32031623 Free PMC article.
-
Anonymisation of geographical distance matrices via Lipschitz embedding.Int J Health Geogr. 2016 Jan 7;15:1. doi: 10.1186/s12942-015-0031-7. Int J Health Geogr. 2016. PMID: 26739310 Free PMC article.
-
Protecting count queries in study design.J Am Med Inform Assoc. 2012 Sep-Oct;19(5):750-7. doi: 10.1136/amiajnl-2011-000459. Epub 2012 Apr 17. J Am Med Inform Assoc. 2012. PMID: 22511018 Free PMC article.
-
Musings on privacy issues in health research involving disaggregate geographic data about individuals.Int J Health Geogr. 2009 Jul 20;8:46. doi: 10.1186/1476-072X-8-46. Int J Health Geogr. 2009. PMID: 19619311 Free PMC article. Review.
References
-
- Platt P, Hendlisz L, Intrator D. Privacy Law in the Private Sector: An Annotation of the Legislation in CanadaCanada Law Book; 2004.
-
- Willison D, Emerson C, Szala-Meneok K, et al. Access to medical records for research purposes: Varying perceptions across Research Ethics Boards J Med Ethics 2008;34:308-314. - PubMed
-
- Woolf S, Rothemich JR S, Marsland D. Selection bias from requiring patients to give consent to examine data for health services research Arch Fam Med 2000;9:1111-1118. - PubMed
-
- Jacobsen S, Xia Z, Campion M, et al. Potential effect of authorization bias on medical records research Mayo Clin Proc 1999;74(4):330-338. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources