Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 12;88(7):e0005222.
doi: 10.1128/aem.00052-22. Epub 2022 Mar 14.

Determining Informative Microbial Single Nucleotide Polymorphisms for Human Identification

Affiliations

Determining Informative Microbial Single Nucleotide Polymorphisms for Human Identification

Allison J Sherier et al. Appl Environ Microbiol. .

Abstract

The skin microbiome is a highly abundant and relatively stable source of DNA that may be utilized for human identification (HID). In this study, a set of single nucleotide polymorphisms (SNPs) with a high mean estimated Wright's fixation index (FST) (>0.1) and widespread abundance (found in ≥75% of samples compared) were selected from a diverse set of markers in the hidSkinPlex panel. The least absolute shrinkage and selection operator (LASSO) was used in a novel machine learning framework to generate a SNP panel and predict the human host from skin microbiome samples collected from the hand, manubrium, and foot. The framework was devised to emulate a new unknown person introduced to the algorithm and to match samples from that person against a population database. Unknown samples were classified with 96% accuracy (Matthews correlation coefficient [MCC], 0.954) in the test (n = 225 samples) data set. A final panel of informative SNPs was determined for HID (hidSkinPlex+) using all 51 individuals sampled at three body sites in triplicate. The hidSkinPlex+ panel comprises 365 SNPs and yielded prediction accuracy for the correct host of 95% (MCC = 0.949). The accuracy of the hidSkinPlex+ panel may be somewhat overestimated due to using 26 individuals from the training data set for the selection of the final panel. However, this accuracy still provides an indication of performance when tested on new samples. IMPORTANCE One of the fundamental goals in forensic genetics is to identify the source of biological evidence. Methods for detecting human DNA have advanced and can be quite sensitive, but not all DNA samples are amenable to current methods. However, the human skin microbiome is a source of DNA with high copy numbers, and it has the potential for high discriminatory power. The hidSkinPlex panel has been used for HID; however, some aspects of it could be improved. Missing information is ambiguous, as it is unclear if marker drop-out is a by-product of a low-template sample or if the reasons for not observing a marker are biological. Such ambiguity may confound methods for HID, and as such, an improved marker set (hidSkinPlex+) was designed that is considerably smaller and more robust to drop-out (365 SNPs contained in 135 markers) yet still can be used to accurately predict the human host.

Keywords: Wright’s fixation index; hidSkinPlex; human identification; machine learning; massively parallel sequencing; microbial forensics; multinomial logistic regression; skin microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
The average FST estimate and the sample size in the hidSkinPlex pane. The figure on the left shows the distribution of the average FST for all nucleotide positions in the hidSkinPlex. The graph on the right shows the percentage of nucleotide positions in which FST can be estimated.
FIG 2
FIG 2
The average FST estimate and the sample size of the reduced list of 1,344 candidate SNPs from the training data set. The graph on the left shows the distribution of the average FST estimated for the SNP candidate list. The graph on the right shows the distribution of SNPs contained in the top 75% of pairwise comparisons.
FIG 3
FIG 3
Classification results for the and test data sets and the number of samples missing SNPs. The x axis indicates the number of missing SNPs for a given sample. The y axis shows the training and test data sets partitioned into the correct (white) and incorrect (gray) classification groups.

References

    1. Wang Y, Yu Q, Zhou R, Feng T, Hilal MG, Li H. 2021. Nationality and body location alter human skin microbiome. Appl Microbiol Biotechnol 105:5241–5256. doi:10.1007/s00253-021-11387-8. - DOI - PubMed
    1. Ross AA, Doxey AC, Neufeld JD. 2017. The skin microbiome of cohabiting couples. mSystems 2:e00043-17. doi:10.1128/mSystems.00043-17. - DOI - PMC - PubMed
    1. Oh J, Byrd AL, Deming C, Conlan S, Program NCS, Kong HH, Segre JA, NISC Comparative Sequencing Program. 2014. Biogeography and individuality shape function in the human skin metagenome. Nature 514:59–64. doi:10.1038/nature13786. - DOI - PMC - PubMed
    1. Richardson M, Gottel N, Gilbert JA, Lax S. 2019. Microbial similarity between students in a common dormitory environment reveals the forensic potential of individual microbial signatures. mBio 10:e01054-19. doi:10.1128/mBio.01054-19. - DOI - PMC - PubMed
    1. Percival SL, Emanuel C, Cutting KF, Williams DW. 2012. Microbiology of the skin and the role of biofilms in infection. Int Wound J 9:14–32. doi:10.1111/j.1742-481X.2011.00836.x. - DOI - PMC - PubMed

Publication types

LinkOut - more resources