Regarding the F-word: The effects of data filtering on inferred genotype-environment associations
- PMID: 33565725
- DOI: 10.1111/1755-0998.13351
Regarding the F-word: The effects of data filtering on inferred genotype-environment associations
Abstract
Genotype-environment association (GEA) methods have become part of the standard landscape genomics toolkit, yet, we know little about how to best filter genotype-by-sequencing data to provide robust inferences for environmental adaptation. In many cases, default filtering thresholds for minor allele frequency and missing data are applied regardless of sample size, having unknown impacts on the results, negatively affecting management strategies. Here, we investigate the effects of filtering on GEA results and the potential implications for assessment of adaptation to environment. We use empirical and simulated data sets derived from two widespread tree species to assess the effects of filtering on GEA outputs. Critically, we find that the level of filtering of missing data and minor allele frequency affect the identification of true positives. Even slight adjustments to these thresholds can change the rate of true positive detection. Using conservative thresholds for missing data and minor allele frequency substantially reduces the size of the data set, lessening the power to detect adaptive variants (i.e., simulated true positives) with strong and weak strengths of selection. Regardless, strength of selection was a good predictor for GEA detection, but even some SNPs under strong selection went undetected. False positive rates varied depending on the species and GEA method, and filtering significantly impacted the predictions of adaptive capacity in downstream analyses. We make several recommendations regarding filtering for GEA methods. Ultimately, there is no filtering panacea, but some choices are better than others, depending on the study system, availability of genomic resources, and desired objectives.
Keywords: Eucalyptus; GEA; SNP analysis; climate adaptation; genome sequencing; genomic simulation; reduced representation.
© 2021 John Wiley & Sons Ltd.
Similar articles
-
Comparing methods for detecting multilocus adaptation with multivariate genotype-environment associations.Mol Ecol. 2018 May;27(9):2215-2233. doi: 10.1111/mec.14584. Epub 2018 Apr 23. Mol Ecol. 2018. PMID: 29633402
-
Genome-environment association study suggests local adaptation to climate at the regional scale in Fagus sylvatica.New Phytol. 2016 Apr;210(2):589-601. doi: 10.1111/nph.13809. Epub 2016 Jan 18. New Phytol. 2016. PMID: 26777878
-
Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses.Syst Biol. 2024 May 27;73(1):76-101. doi: 10.1093/sysbio/syad065. Syst Biol. 2024. PMID: 37881861
-
Genome-Environment Associations, an Innovative Tool for Studying Heritable Evolutionary Adaptation in Orphan Crops and Wild Relatives.Front Genet. 2022 Aug 5;13:910386. doi: 10.3389/fgene.2022.910386. eCollection 2022. Front Genet. 2022. PMID: 35991553 Free PMC article. Review.
-
A practical guide to environmental association analysis in landscape genomics.Mol Ecol. 2015 Sep;24(17):4348-70. doi: 10.1111/mec.13322. Mol Ecol. 2015. PMID: 26184487 Review.
Cited by
-
Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data.Mol Ecol Resour. 2022 Oct;22(7):2599-2613. doi: 10.1111/1755-0998.13646. Epub 2022 Jun 5. Mol Ecol Resour. 2022. PMID: 35593534 Free PMC article.
-
Leaf Economic and Hydraulic Traits Signal Disparate Climate Adaptation Patterns in Two Co-Occurring Woodland Eucalypts.Plants (Basel). 2022 Jul 14;11(14):1846. doi: 10.3390/plants11141846. Plants (Basel). 2022. PMID: 35890479 Free PMC article.
-
Concordant Signal of Genetic Variation Across Marker Densities in the Desert Annual Chylismia brevipes Is Linked With Timing of Winter Precipitation.Evol Appl. 2024 Dec 16;17(12):e70046. doi: 10.1111/eva.70046. eCollection 2024 Dec. Evol Appl. 2024. PMID: 39691745 Free PMC article.
-
Easy-to-use R functions to separate reduced-representation genomic datasets into sex-linked and autosomal loci, and conduct sex assignment.Mol Ecol Resour. 2025 Jul;25(5):e13844. doi: 10.1111/1755-0998.13844. Epub 2023 Aug 1. Mol Ecol Resour. 2025. PMID: 37526650 Free PMC article.
-
Next-generation data filtering in the genomics era.Nat Rev Genet. 2024 Nov;25(11):750-767. doi: 10.1038/s41576-024-00738-6. Epub 2024 Jun 14. Nat Rev Genet. 2024. PMID: 38877133 Review.
References
REFERENCES
-
- Ahrens, C. W., Byrne, M., & Rymer, P. D. (2019). Standing genomic variation within coding and regulatory regions contributes to the adaptive capacity to climate in a foundation tree species. Molecular Ecology, 28(10), 2502-2516.
-
- Ahrens, C. W., James, E. A., Miller, A. D., Ferguson, S., Aitken, N. C., Jones, A. W., Lu-Irving, P., Borevitz, J. O., Cantrill, D. J., & Rymer, P. D. (2020). Spatial, climate, and ploidy factors drive genomic diversity and resilience in the widespread grass Themeda triandra. Molecular Ecology, 29(20), 3872-3888. https://doi.org/10.1111/mec.15614
-
- Ahrens, C. W., Rymer, P. D., Stow, A., Bragg, J., Dillon, S., Umbers, K. D. L., & Dudaniec, R. Y. (2018). The search for loci under selection: trends, biases and progress. Molecular Ecology, 27(6), 1342-1356.
-
- Andrews, K. R., & Luikart, G. (2014). Recent novel approaches for population genomics data analysis. Molecular Ecology, 23(7), 1661-1667.
-
- Bay, R. A., Harrigan, R. J., Le Underwood, V., Gibbs, H. L., Smith, T. B., & Ruegg, K. (2018). Genomic signals of selection predict climate-driven population declines in a migratory bird. Science, 359(6371), 83-86.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources