Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 7;11(9):e0160653.
doi: 10.1371/journal.pone.0160653. eCollection 2016.

Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome

Affiliations

Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome

Glendon J Parker et al. PLoS One. .

Abstract

Human identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects' DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European-American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.

PubMed Disclaimer

Conflict of interest statement

Patent based on the concept and some data presented in this study have been awarded (US 8,877,455 B2, Australian Patent 2011229918, Canadian Patent CA 2794248, and European Patent EP11759843.3, GJP inventor). The patent is owned by Parker Proteomics LLC. Protein-Based Identification Technologies LLC has an exclusive license to develop the intellectual property and is co-owned by Utah Valley University and GJP. This ownership of PBIT and associated intellectual property does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Direct validation of imputed non-synonymous SNP alleles.
A) Genetically variant peptides (GVPs) that contained single amino-acid polymorphisms (SAPs) were identified in both European-American cohorts (EA1 and EA2) and collated for each subject. Imputed nsSNP alleles (Gene Name = GN, SNP accession number = rs#, allele nucleotide = nuc) were directly compared to the genotype resulting from direct Sanger sequencing (S1 Methods). Correctly imputed nsSNP alleles (TP, true positives) are indicated by a blue square. Imputed alleles that were incorrectly predicted (FP, false positive) are indicated by red squares. Alleles that were identified using Sanger sequencing, but did not contain a resulting GVP in the matching proteomic dataset (FN, false negative) are indicated by light green squares. Alleles absent in both subjects DNA and in resulting proteomic datasets (TN, true negatives) are indicated by white squares[49]. Failed Sanger sequencing determination of nsSNP allelic status is indicated by grey. B) The effectiveness of each SAP-containing peptide to impute nsSNP alleles was also quantified. The sensitivity of each genetically variant peptide, measured as the proportion of nsSNP-alleles that are correctly detected and imputed (TP/(TP+FN)), was calculated as a percentage (log10(%). The positive predictive value (PPV) of genetically variant peptide-based SNP imputations was calculated as the percentage of correct validated SNP imputations of all imputations (TP/(TP + FP); log10(%))[49]. C)
Fig 2
Fig 2. Imputed nsSNP profile probabilities in European and African populations.
A) The probability of an overall individual nsSNP profile in the population (Pr(profile|population)) was estimated by determining the probability of detected nsSNP alleles, or allele combination, in each gene (Pr(nsSNP gene profile|population)), and then using the product rule to multiply these probabilities together (Pr(overall profile|population)). B) The probability of overall imputed nsSNP profiles occurring in the European population (Pr(profile|EUR population)) was calculated using imputed nsSNP alleles from individuals in the two European-American cohorts (EA1 and EA2) and the product rule. Values are presented as a logarithm (log10(Pr(profile|EUR population))). Confidence intervals (90% CI) are estimated using parametric bootstrapping. C) The overall imputed nsSNP profile probability in the African population was also calculated (Pr(profile|AFR population)) and plotted versus the probability of the profile occurring in the European population (Pr(profile|EUR population)). Confidence intervals (90% CI) were estimated using parametric bootstrapping. In addition to European–American subjects (red), imputed nsSNP profile probabilities were also estimated from proteomic datasets derived from an African-American (green) and Kenyan (blue) cohort. The line of equal profile probability in the European and African population is indicated (dotted line). D) The likelihood of hair samples coming from a European relative to African genetic background was calculated as the ratio of overall imputed nsSNP profile probabilities in the European and African populations (EUR/AFR = Pr(profile|EUR population)/Pr(profile|AFR population)); European-American subjects (red), African-American subjects (green), and Kenyan subjects (blue) are indicated.
Fig 3
Fig 3. Comparison of probability estimates based on imputed nsSNPs and mitochondrial DNA haplotype.
The mitochondrial DNA haplotype and subgroup from one of the European-American cohorts (EA2, n = 15) were classified, compared to a database of subjects from an American sample population (Utah, n = 9,372), and the logarithm of haplotype probability was calculated (log10(Pr(mtDNA haplotype|Utah population)), black bars). Genetically variant peptides containing single amino acid polymorphisms were identified in the hair shaft proteomic datasets of the same subjects, an overall profile of imputed nsSNP loci determined, and logarithm of the probability of each profile occurring in the European population was calculated as described in the Materials and Methods section (log10(Pr(imputed nsSNP profile|EUR population)), red bars). Confidence intervals (90% CI) were estimated using parametric bootstrapping. Each measure is represented using the same axis (log10(Pr(profile|population))).
Fig 4
Fig 4. Hair shaft proteomic profile in modern and archaeological samples.
A) Absolute protein abundance from all datasets corresponding to a cohort of European-American subjects (EA2, subjects 1 to 19) and archaeological subjects (S1 to S6) was measured (www.thegpm.org) and collated. Proteins that appeared in proteomic datasets of 15% or more of the subjects (n = 401) were aligned as a paralogous neighbor-joining tree in order to cluster detected proteins with higher levels of homology (www.uniprot.org.). The neighbor-joining tree based on protein paralogy is aligned on the vertical and subjects on the horizontal. Protein abundance is indicated by conditional formatting (maximum value = yellow, minimal value = black). B) The function of individual proteins was obtained (www.uniprot.org) and collated for both modern (EA2, 1 to 19) and archaeological (S1 to S6) hair shaft samples (categories = structural, metabolism, protein and RNA regulation, membrane proteins, and miscellaneous). The relative abundance of the different protein classes is indicated by area. The size of each circle is proportional to the relative abundance of total detected peptides in each sample class.
Fig 5
Fig 5. Imputed nsSNP loci in archaeological hair shaft proteomes.
A) Hair was obtained from six individuals from two separate post-medieval archaeological assemblages excavated in London and Kent (S1 to S6) and proteomic datasets obtained (S1 Methods). Peptides containing single amino acid polymorphisms (Gene Name; GN) were identified, collated, and nsSNP loci and alleles imputed (dbSNP identifier and nucleotide = rs# and nuc) in Subjects S1 to S6. The proportion of each allele in the European (EUR) and African (AFR) population is included. B) The overall imputed nsSNP profile probability (Pr(profile|population)) in the European (EUR, black bars) and African (AFR, grey bars) population was calculated as the product of imputed nsSNP, or combination of nsSNP, probabilities for each gene. C) Likelihood measurements of European compared to African genetic origin were calculated as a quotient of overall imputed nsSNP profile frequencies (Pr(profile|EUR population))/(Pr(profile|AFR population)).

References

    1. The National Research Council. Strengthening Forensic Science in the United States: A Path Forward Washington D.C.: The National Academy Press; 2009. September 9, 2009.
    1. Butler JM. Fundamentals of Forensic DNA Typing: Academic Press; 2010.
    1. Guenther CA, Tasic B, Luo L, Bedell MA, Kingsley DM. A molecular basis for classic blond hair color in Europeans. Nature genetics. 2014;46(7):748–52. 10.1038/ng.2991 . - DOI - PMC - PubMed
    1. Jia J, Wei YL, Qin CJ, Hu L, Wan LH, Li CX. Developing a novel panel of genome-wide ancestry informative markers for bio-geographical ancestry estimates. Forensic science international Genetics. 2014;8(1):187–94. 10.1016/j.fsigen.2013.09.004 . - DOI - PubMed
    1. Liu F, Hendriks AE, Ralf A, Boot AM, Benyi E, Savendahl L, et al. Common DNA variants predict tall stature in Europeans. Human genetics. 2014;133(5):587–97. 10.1007/s00439-013-1394-0 . - DOI - PubMed