Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov:89:34-43.
doi: 10.1016/j.tpb.2013.08.004. Epub 2013 Aug 20.

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Affiliations

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Katarzyna Bryc et al. Theor Popul Biol. 2013 Nov.

Abstract

We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.

Keywords: Eigenanalysis; Eigenvalues; Number of subpopulations; Population structure; Principal components analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of the eigenvalues from PCA of Hapmap CEU, CHB, and YRI unrelated individuals (parents of trios), excluding the large eigenvalues (Λ ≫ 1), which are omitted to better illustrate the shape of the non-significant eigenvalues. Here, the largest three eigenvalues that correspond to subpopulation structure are Λ1 = 102.0, Λ2 = 14.55, Λ3 = 7.37.
Figure 2
Figure 2
Including related individuals perturbs the expected distribution of eigenvalues resulting from PCA. Left: A counts histogram of the eigenvalues from PCA using data generated via binomial simulation, where 29% of the individuals have been repeated. Large eigenvalues (corresponding to population structure) are not shown so as to better illustrate the effect on the distribution of the non-significant eigenvalues. Right: A histogram of the eigenvalues for PCA of three populations of HapMap (CEU, YRI, and CHB) including trios – 297 parents and their related 108 offspring. Large eigenvalues are not shown. Both simulated data and empirical genotype data show that inclusion of related individuals results in a multi-modal distribution of the eigenvalues, arising from the non-random correlations of individuals.
Figure 3
Figure 3
Histogram of the eigenvalues from PCA of all 11 populations in HapMap unrelated individuals. Nonautosomal markers with M = 924, N = 422253. The six largest eigenvalues Λ1 = 335.9, Λ2 = 37.4, Λ3 = 16.7, Λ4 = 2.5, Λ5 = 2.1, Λ6 = 1.7 are not shown.

Similar articles

Cited by

References

    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009;19:1655–1664. - PMC - PubMed
    1. Bai Z, Silverstein J. Spectral analysis of large dimensional random matrices. 2 Springer; 2009.
    1. Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995;96:3–12. - PubMed
    1. Benaych-Georges F, Nadakuditi RR. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics. 2011;227:494–521.
    1. Bovine HapMap Consortium T Genome-wide survey of snp variation uncovers the genetic structure of cattle breeds. Science. 2009;324:528–532. - PMC - PubMed

Publication types