Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul;20(4):936-949.
doi: 10.1111/1755-0998.13171. Epub 2020 May 25.

Evaluation of model fit of inferred admixture proportions

Affiliations

Evaluation of model fit of inferred admixture proportions

Genís Garcia-Erill et al. Mol Ecol Resour. 2020 Jul.

Abstract

Model based methods for genetic clustering of individuals, such as those implemented in structure or ADMIXTURE, allow the user to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In the case of a bad fitting admixture model, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and nondiscrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and low depth sequencing data.

Keywords: admixture; ancestry; evaluation; model fit; population structure; select K.

PubMed Disclaimer

References

REFERENCES

    1. Alexander, D. H., & Lange, K. (2011). Enhancements to the admixture algorithm for individual ancestry estimation. BMC Bioinformatics, 12, 246. https://doi.org/10.1186/1471-2105-12-246
    1. Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655-1664. https://doi.org/10.1101/gr.094052.109
    1. Anderson, E. C., & Dunham, K. K. (2008). The influence of family groups on inferences made with the program structure. Molecular Ecology Resources, 8(6), 1219-1229.
    1. Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., … Abecasis, G. R. (2015). A global reference for human genetic variation. Nature, 526(7571), 68-74.
    1. Balding, D. J., & Nichols, R. A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica, 96(1-2), 3-12. https://doi.org/10.1007/BF01441146

LinkOut - more resources