Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks
- PMID: 37738175
- PMCID: PMC10581698
- DOI: 10.1093/molbev/msad211
Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Keywords: allele age; coalescence time; heritability; machine learning; natural selection.
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
Figures






Similar articles
-
High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability.Nat Genet. 2018 Sep;50(9):1311-1317. doi: 10.1038/s41588-018-0177-x. Epub 2018 Aug 13. Nat Genet. 2018. PMID: 30104759 Free PMC article.
-
Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning.Mol Biol Evol. 2024 May 3;41(5):msae077. doi: 10.1093/molbev/msae077. Mol Biol Evol. 2024. PMID: 38636507 Free PMC article.
-
Bayesian neural networks with variable selection for prediction of genotypic values.Genet Sel Evol. 2020 May 15;52(1):26. doi: 10.1186/s12711-020-00544-8. Genet Sel Evol. 2020. PMID: 32414320 Free PMC article.
-
Robust inference of population size histories from genomic sequencing data.PLoS Comput Biol. 2022 Sep 16;18(9):e1010419. doi: 10.1371/journal.pcbi.1010419. eCollection 2022 Sep. PLoS Comput Biol. 2022. PMID: 36112715 Free PMC article.
-
Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries.Nat Genet. 2024 May;56(5):767-777. doi: 10.1038/s41588-024-01704-y. Epub 2024 Apr 30. Nat Genet. 2024. PMID: 38689000 Free PMC article.
Cited by
-
Coalescence and Translation: A Language Model for Population Genetics.bioRxiv [Preprint]. 2025 Jun 27:2025.06.24.661337. doi: 10.1101/2025.06.24.661337. bioRxiv. 2025. PMID: 40666889 Free PMC article. Preprint.
-
Allele ages provide limited information about the strength of negative selection.Genetics. 2025 Mar 17;229(3):iyae211. doi: 10.1093/genetics/iyae211. Genetics. 2025. PMID: 39698825 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous