Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
- PMID: 37433019
- PMCID: PMC10365025
- DOI: 10.1093/molbev/msad157
Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
Abstract
Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
Keywords: artificial intelligence; natural selection; signal decomposition.
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
Figures








Similar articles
-
Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data.Mol Biol Evol. 2023 Oct 4;40(10):msad216. doi: 10.1093/molbev/msad216. Mol Biol Evol. 2023. PMID: 37772983 Free PMC article.
-
ImaGene: a convolutional neural network to quantify natural selection from genomic data.BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):337. doi: 10.1186/s12859-019-2927-x. BMC Bioinformatics. 2019. PMID: 31757205 Free PMC article.
-
Tensor decomposition based feature extraction and classification to detect natural selection from genomic data.bioRxiv [Preprint]. 2023 Mar 29:2023.03.27.527731. doi: 10.1101/2023.03.27.527731. bioRxiv. 2023. Update in: Mol Biol Evol. 2023 Oct 4;40(10):msad216. doi: 10.1093/molbev/msad216. PMID: 37034767 Free PMC article. Updated. Preprint.
-
Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data.J Comput Biol. 2022 Sep;29(9):943-960. doi: 10.1089/cmb.2021.0447. Epub 2022 May 30. J Comput Biol. 2022. PMID: 35639362 Review.
-
Genomic resources and their influence on the detection of the signal of positive selection in genome scans.Mol Ecol. 2016 Jan;25(1):170-84. doi: 10.1111/mec.13468. Epub 2015 Dec 17. Mol Ecol. 2016. PMID: 26562485 Review.
Cited by
-
Sweeps in space: leveraging geographic data to identify beneficial alleles in Anopheles gambiae.bioRxiv [Preprint]. 2025 Apr 23:2025.02.07.637123. doi: 10.1101/2025.02.07.637123. bioRxiv. 2025. Update in: Mol Biol Evol. 2025 Jun 4;42(6):msaf141. doi: 10.1093/molbev/msaf141. PMID: 39975147 Free PMC article. Updated. Preprint.
-
Tree Sequences as a General-Purpose Tool for Population Genetic Inference.Mol Biol Evol. 2024 Nov 1;41(11):msae223. doi: 10.1093/molbev/msae223. Mol Biol Evol. 2024. PMID: 39460991 Free PMC article.
-
Signatures of soft selective sweeps predominate in the yellow fever mosquito Aedes aegypti.bioRxiv [Preprint]. 2025 Jul 10:2025.07.06.663360. doi: 10.1101/2025.07.06.663360. bioRxiv. 2025. PMID: 40672212 Free PMC article. Preprint.
-
Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning.Mol Biol Evol. 2025 Apr 30;42(5):msaf094. doi: 10.1093/molbev/msaf094. Mol Biol Evol. 2025. PMID: 40341942 Free PMC article.
-
Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data.Mol Biol Evol. 2023 Oct 4;40(10):msad216. doi: 10.1093/molbev/msad216. Mol Biol Evol. 2023. PMID: 37772983 Free PMC article.
References
-
- Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. 2015. TensorFlow: large-scale machine learning on heterogeneous systems. Available from:https://www.tensorflow.org/
-
- Abu-Mostafa YS, Atiya AF. 1996. Introduction to financial forecasting. Appl Intel. 6:205–213.
-
- Agrawal A, Mittal N. 2020. Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis Comput. 36:405–412.
-
- Akiyama M. 2014. The roles of ABCA12 in epidermal lipid barrier formation and keratinocyte differentiation. Biochim Biophys Acta. 1841:435–440. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources