Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Mar-Apr;16(2):256-66.
doi: 10.1197/jamia.M2902. Epub 2008 Dec 11.

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk

Affiliations

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk

Khaled El Emam et al. J Am Med Inform Assoc. 2009 Mar-Apr.

Abstract

Objective: In public health and health services research, the inclusion of geographic information in data sets is critical. Because of concerns over the re-identification of patients, data from small geographic areas are either suppressed or the geographic areas are aggregated into larger ones. Our objective is to estimate the population size cut-off at which a geographic area is sufficiently large so that no data suppression or further aggregation is necessary.

Design: The 2001 Canadian census data were used to conduct a simulation to model the relationship between geographic area population size and uniqueness for some common demographic variables. Cut-offs were computed for geographic area population size, and prediction models were developed to estimate the appropriate cut-offs.

Measurements: Re-identification risk was measured using uniqueness. Geographic area population size cut-offs were estimated using the maximum number of possible values in the data set and a traditional entropy measure.

Results: The model that predicted population cut-offs using the maximum number of possible values in the data set had R2 values around 0.9, and relative error of prediction less than 0.02 across all regions of Canada. The models were then applied to assess the appropriate geographic area size for the prescription records provided by retail and hospital pharmacies to commercial research and analysis firms.

Conclusions: To manage re-identification risk, the prediction models can be used by public health professionals, health researchers, and research ethics boards to decide when the geographic area population size is sufficiently large.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of how the GAPS cutoff is calculated. Uniqueness is computed as the proportion of individuals who are unique on the values of the quasi-identifiers. For example, a uniqueness of 0.02 for a geographic area of 10,000 individuals on age, ethnicity, and gender means that 200 individuals have unique values on the combination of these three variables. At the limit, with an infinitely sized area, the uniqueness approaches zero. The delta value is the uniqueness at the GAPS cutoff value.
Figure 2
Figure 2
Example showing the actual relationship between geographic area size and proportion uniques in the central region for the three variables: age, gender, and ethnicity.

Similar articles

Cited by

References

    1. Platt P, Hendlisz L, Intrator D. Privacy Law in the Private Sector: An Annotation of the Legislation in CanadaCanada Law Book; 2004.
    1. Willison D, Emerson C, Szala-Meneok K, et al. Access to medical records for research purposes: Varying perceptions across Research Ethics Boards J Med Ethics 2008;34:308-314. - PubMed
    1. Woolf S, Rothemich JR S, Marsland D. Selection bias from requiring patients to give consent to examine data for health services research Arch Fam Med 2000;9:1111-1118. - PubMed
    1. Junghans C, Feder G, Hemingway H, Timmis A, Jones M. Recruiting patients to medical research: Double blind randomised trial of ‘opt-in' versus ‘opt-out' strategies Br Med J 2005;331(7522):940Oct 22; Epub 2005 Sep 12. - PMC - PubMed
    1. Jacobsen S, Xia Z, Campion M, et al. Potential effect of authorization bias on medical records research Mayo Clin Proc 1999;74(4):330-338. - PubMed

Publication types