Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population
- PMID: 35007855
- DOI: 10.1016/j.fsigen.2021.102659
Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population
Abstract
Improving the resolution of the current widely used Y-chromosomal short tandem repeat (Y-STR) dataset is of great importance for forensic investigators, and the current approach is limited, except for the addition of more Y-STR loci. In this research, a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang Han population, and 3473 unrelated male individuals were enrolled. Based on data on the male haplogroups under different panels, the matched or near-matching (NM) Y-STR haplotype pairs from different haplogroups indicated the critical roles of haplogroups in improving the regional Y-STR haplotype resolution. A classic median-joining network analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures, which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally, population substructures were reconstructed using multiple unsupervised or supervised dimensionality reduction methods, which indicated the potential of Y-STR haplotypes in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on nine publicly accessible machine-learning (ML) approaches. The results showed that the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54% for detailed haplogroups. Potential influences on prediction accuracy were assessed by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities, and performing data processing. ML-based predictors generally presented a better prediction accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models were developed based on the Yfiler Plus panel with unprocessed input data, which showed their strong generalization ability in classifying various Chinese Han subgroups (validation dataset). In conclusion, this study revealed the significance and application prospects of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic Y-STR databases to develop haplogroup prediction tools to improve the accuracy of biogeographic ancestry inferences.
Keywords: Database development; Machine learning; Y-SNP haplogroup; Y-STR haplotype resolution.
Copyright © 2022 Elsevier B.V. All rights reserved.
Similar articles
-
Genetic Reconstruction and Forensic Analysis of Chinese Shandong and Yunnan Han Populations by Co-Analyzing Y Chromosomal STRs and SNPs.Genes (Basel). 2020 Jul 3;11(7):743. doi: 10.3390/genes11070743. Genes (Basel). 2020. PMID: 32635262 Free PMC article.
-
Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes.Forensic Sci Int. 2022 Nov;340:111466. doi: 10.1016/j.forsciint.2022.111466. Epub 2022 Sep 15. Forensic Sci Int. 2022. PMID: 36150277
-
Paternal genetic structure analysis of the modern Han populations based on Y-SNP and Y-STR.Yi Chuan. 2024 Feb 20;46(2):149-167. doi: 10.16288/j.yczz.23-260. Yi Chuan. 2024. PMID: 38340005
-
Male-specific contributions to the Brazilian population of Espirito Santo.Int J Legal Med. 2016 May;130(3):679-81. doi: 10.1007/s00414-015-1214-2. Epub 2015 Jun 16. Int J Legal Med. 2016. PMID: 26076592 Review.
-
Toward a consensus on SNP and STR mutation rates on the human Y-chromosome.Hum Genet. 2017 May;136(5):575-590. doi: 10.1007/s00439-017-1805-8. Epub 2017 Apr 28. Hum Genet. 2017. PMID: 28455625 Review.
Cited by
-
A Review on Microbial Species for Forensic Body Fluid Identification in Healthy and Diseased Humans.Curr Microbiol. 2023 Jul 25;80(9):299. doi: 10.1007/s00284-023-03413-x. Curr Microbiol. 2023. PMID: 37491404 Free PMC article. Review.
-
Comprehensive insights into the genetic background of Chinese populations using Y chromosome markers.R Soc Open Sci. 2023 Sep 20;10(9):230814. doi: 10.1098/rsos.230814. eCollection 2023 Sep. R Soc Open Sci. 2023. PMID: 37736526 Free PMC article.
-
Exploring Y-chromosomal STRs and SNPs for forensic and genetic insights in the Jiangsu Han population.BMC Genomics. 2025 May 2;26(1):440. doi: 10.1186/s12864-025-11634-6. BMC Genomics. 2025. PMID: 40316924 Free PMC article.
-
Comprehensive analyses of genetic diversities and population structure of the Guizhou Dong group based on 44 Y-markers.PeerJ. 2023 Sep 25;11:e16183. doi: 10.7717/peerj.16183. eCollection 2023. PeerJ. 2023. PMID: 37780380 Free PMC article.
-
Forensic Analysis and Genetic Structure Construction of Chinese Chongming Island Han Based on Y Chromosome STRs and SNPs.Genes (Basel). 2022 Jul 29;13(8):1363. doi: 10.3390/genes13081363. Genes (Basel). 2022. PMID: 36011274 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources