Dispersal inference from population genetic variation using a convolutional neural network
- PMID: 37052957
- PMCID: PMC10213498
- DOI: 10.1093/genetics/iyad068
Dispersal inference from population genetic variation using a convolutional neural network
Abstract
The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Keywords: deep learning; dispersal; machine learning; population genomics; space.
© The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America.
Figures





References
-
- Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016, preprint: not peer reviewed.
-
- Abbott RJ, Gomes MF. Population genetic structure and outcrossing rate of Arabidopsis thaliana (L.) Heynh. Heredity. 1989;62(3):411–418. doi:10.1038/hdy.1989.56 - DOI
-
- Akçakaya HR, Brook BW. Methods for determining viability of wildlife populations in large landscapes. Models for Planning Wildlife conservation in Large Landscapes. 2008. p. 449–472.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous