Identifying personal genomes by surname inference
- PMID: 23329047
- DOI: 10.1126/science.1229566
Identifying personal genomes by surname inference
Abstract
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
Comment in
-
Data re-identification: societal safeguards.Science. 2013 Mar 1;339(6123):1032-3. doi: 10.1126/science.339.6123.1032-c. Science. 2013. PMID: 23449577 Free PMC article. No abstract available.
-
Genomic privacy in the information age.Clin Chem. 2013 Aug;59(8):1148-50. doi: 10.1373/clinchem.2013.205260. Epub 2013 Apr 19. Clin Chem. 2013. PMID: 23603798 No abstract available.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources