'Highly-Informative' Genetic Markers Can Bias Conclusions: Examples and General Solutions
- PMID: 40641441
- PMCID: PMC12415817
- DOI: 10.1111/1755-0998.70011
'Highly-Informative' Genetic Markers Can Bias Conclusions: Examples and General Solutions
Abstract
High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined 'populations'. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).
Keywords: ecological genetics; genomics/proteomics; natural selection and contemporary evolution; population genetics—theoretical.
© 2025 The Author(s). Molecular Ecology Resources published by John Wiley & Sons Ltd.
Conflict of interest statement
Benefits Sharing: This study provides methodology and addresses issues intended to help population geneticists and improve analytical approaches for the broader scientific field. All collaborators are included as co‐authors.
The authors declare no conflicts of interest.
Figures





References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Miscellaneous