Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;19(9):1655-64.
doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

Fast model-based estimation of ancestry in unrelated individuals

Affiliations

Fast model-based estimation of ancestry in unrelated individuals

David H Alexander et al. Genome Res. 2009 Sep.

Abstract

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Analyses of the HapMap3 data set. K = 3 for ADMIXTURE and structure. Plotted for each individual i are the point formula image for ADMIXTURE (A) and structure (B), and the point (PC1i, PC2i) for EIGENSTRAT (C). Self-reported ancestries are indicated: (×) YRI; (○) ASW; (+) CEU; (●) MEX.
Figure 2.
Figure 2.
Analyses of the IBD data set. K = 3 for ADMIXTURE and structure. In this data set, the evidence for a third population is weak. (A) ADMIXTURE; (B) structure; (C) EIGENSTRAT.
Figure 3.
Figure 3.
Analysis of the IBD data set using K = 2 for structure and ADMIXTURE and the first principal component from EIGENSTRAT. The plots shown are histograms of the first estimated ancestry parameter (formula image, or PC1) for the individuals, conditioned on self-reported ancestry. Only individuals self-reporting as Ashkenazi Jewish or northwestern European are shown. (A) ADMIXTURE; (B) structure; (C) EIGENSTRAT.

References

    1. de Leeuw J. Block relaxation algorithms in statistics. In: Bock H, et al., editors. Information systems and data analysis. Springer Verlag; New York: 1994.
    1. Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;39:1–38.
    1. Efron B, Tibshirani R. An introduction to the bootstrap. CRC Press; Boca Raton, FL: 1993.
    1. Falush D, Stephens M, Pritchard JK. Inference of population structure using multi-locus genotype data, linked loci, and correlated allele frequencies. Genetics. 2003;164:1567–1587. - PMC - PubMed
    1. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320. - PMC - PubMed

Publication types