Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers
- PMID: 38244524
- DOI: 10.1016/j.fsigen.2024.103008
Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers
Abstract
Inferring the number of contributors (NoC) is a crucial step in interpreting DNA mixtures, as it directly affects the accuracy of the likelihood ratio calculation and the assessment of evidence strength. However, obtaining the correct NoC in complex DNA mixtures remains challenging due to the high degree of allele sharing and dropout. This study aimed to analyze the impact of allele sharing and dropout on NoC inference in complex DNA mixtures when using microhaplotypes (MH). The effectiveness and value of highly polymorphic MH for NoC inference in complex DNA mixtures were evaluated through comparing the performance of three NoC inference methods, including maximum allele count (MAC) method, maximum likelihood estimation (MLE) method, and random forest classification (RFC) algorithm. In this study, we selected the top 100 most polymorphic MH from the Southern Han Chinese (CHS) population, and simulated over 40 million complex DNA mixture profiles with the NoC ranging from 2 to 8. These profiles involve unrelated individuals (RM type) and related pairs of individuals, including parent-offspring pairs (PO type), full-sibling pairs (FS type), and second-degree kinship pairs (SE type). Our results indicated that how the number of detected alleles in DNA mixture profiles varied with the markers' polymorphism, kinship's involvement, NoC, and dropout settings. Across different types of DNA mixtures, the MAC and MLE methods performed best in the RM type, followed by SE, FS, and PO types, while RFC models showed the best performance in the PO type, followed by RM, SE, and FS types. The recall of all three methods for NoC inference were decreased as the NoC and dropout levels increased. Furthermore, the MLE method performed better at low NoC, whereas RFC models excelled at high NoC and/or high dropout levels, regardless of the availability of a priori information about related pairs of individuals in DNA mixtures. However, the RFC models which considered the aforementioned priori information and were trained specifically on each type of DNA mixture profiles, outperformed RFC_ALL model that did not consider such information. Finally, we provided recommendations for model building when applying machine learning algorithms to NoC inference.
Keywords: Complex DNA mixtures; Inference of the number of contributors; Machine learning; Microhaplotypes.
Copyright © 2024 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare no conflict of interest.
Similar articles
-
Improved individual identification in DNA mixtures of unrelated or related contributors through massively parallel sequencing.Forensic Sci Int Genet. 2024 Sep;72:103078. doi: 10.1016/j.fsigen.2024.103078. Epub 2024 Jun 12. Forensic Sci Int Genet. 2024. PMID: 38889491
-
Evaluation of large-scale highly polymorphic microhaplotypes in complex DNA mixtures analysis using RMNE method.Forensic Sci Int Genet. 2023 Jul;65:102874. doi: 10.1016/j.fsigen.2023.102874. Epub 2023 Apr 14. Forensic Sci Int Genet. 2023. PMID: 37075688
-
A continuous model for interpreting microhaplotype profiles of forensic DNA mixtures.Forensic Sci Int Genet. 2025 Jun;78:103271. doi: 10.1016/j.fsigen.2025.103271. Epub 2025 Mar 17. Forensic Sci Int Genet. 2025. PMID: 40121764
-
A highly polymorphic panel of 40-plex microhaplotypes for the Chinese Han population and its application in estimating the number of contributors in DNA mixtures.Forensic Sci Int Genet. 2022 Jan;56:102600. doi: 10.1016/j.fsigen.2021.102600. Epub 2021 Oct 8. Forensic Sci Int Genet. 2022. PMID: 34688115
-
Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach.Forensic Sci Int Genet. 2019 Nov;43:102150. doi: 10.1016/j.fsigen.2019.102150. Epub 2019 Aug 23. Forensic Sci Int Genet. 2019. PMID: 31476660
Cited by
-
Developmental and validation of a novel small and high-efficient panel of microhaplotypes for forensic genetics by the next generation sequencing.BMC Genomics. 2024 Oct 14;25(1):958. doi: 10.1186/s12864-024-10880-4. BMC Genomics. 2024. PMID: 39402483 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources