Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar:57:102659.
doi: 10.1016/j.fsigen.2021.102659. Epub 2021 Dec 29.

Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population

Affiliations

Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population

Caiyong Yin et al. Forensic Sci Int Genet. 2022 Mar.

Abstract

Improving the resolution of the current widely used Y-chromosomal short tandem repeat (Y-STR) dataset is of great importance for forensic investigators, and the current approach is limited, except for the addition of more Y-STR loci. In this research, a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang Han population, and 3473 unrelated male individuals were enrolled. Based on data on the male haplogroups under different panels, the matched or near-matching (NM) Y-STR haplotype pairs from different haplogroups indicated the critical roles of haplogroups in improving the regional Y-STR haplotype resolution. A classic median-joining network analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures, which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally, population substructures were reconstructed using multiple unsupervised or supervised dimensionality reduction methods, which indicated the potential of Y-STR haplotypes in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on nine publicly accessible machine-learning (ML) approaches. The results showed that the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54% for detailed haplogroups. Potential influences on prediction accuracy were assessed by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities, and performing data processing. ML-based predictors generally presented a better prediction accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models were developed based on the Yfiler Plus panel with unprocessed input data, which showed their strong generalization ability in classifying various Chinese Han subgroups (validation dataset). In conclusion, this study revealed the significance and application prospects of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic Y-STR databases to develop haplogroup prediction tools to improve the accuracy of biogeographic ancestry inferences.

Keywords: Database development; Machine learning; Y-SNP haplogroup; Y-STR haplotype resolution.

PubMed Disclaimer

Similar articles

Cited by

Publication types

LinkOut - more resources