Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;47(5):550-4.
doi: 10.1038/ng.3244. Epub 2015 Mar 30.

Testing for genetic associations in arbitrarily structured populations

Affiliations

Testing for genetic associations in arbitrarily structured populations

Minsun Song et al. Nat Genet. 2015 May.

Abstract

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as those measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a 'genotype-conditional association test' (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and non-genetic contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed-model and principal-component approaches.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Rationale for the proposed test of association. (a) A graphical model describing population structure and its effects on a trait of interest. Population structure is captured by a common latent variable z among a set of loci xi (i =1,2, …, m), via the allele frequencies πi(z). When one locus has a causal effect on the trait, this induces spurious associations with other loci affected by population structure. At the same time, population structure may be correlated with lifestyle and environment as these are all possibly related to ancestry and geography. (b) Accounting for confounding due to latent population structure. Left panel: A test for association between the ith SNP xi and trait y without taking into account z will produce a spurious association due to the fact that both xi and y are confounded with z. Right panel: A test for association between xi|πi(z) and y will be an unbiased because conditioning on πi(z) breaks the relationship between z and xi.
Figure 2
Figure 2
Performance of association testing methods. One-hundred quantitative trait GWAS studies were simulated in each of the Balding-Nichols, HGDP, TGP, PSD (α =0.1), and Spatial (a =0.1) simulation scenarios (see Online Methods for definitions of each) to compare the Oracle, GCAT (proposed), LMM-EMMAX, LMM-GEMMA, and PCA testing methods. The variance contributions to the trait are genetic=5%, non-genetic=5%, and noise=90%. The difference between the observed number of false positives and expected number of false positives is plotted against the expected number of false positives under the null hypothesis of no association for each simulated study (grey lines), the average of those differences (black line), and the middle 90% (blue lines). All simulations involved m =100,000 SNPs, so the range of the x-axis corresponds to choosing a significance threshold of up to p-value ≤ 0.0025. The difference on the y-axis is the number of “spurious associations.” PCA is shown on a separate y-axis since it usually has a much larger maximum than the other methods. The Oracle method is where the true population structure parameters are inputted into the proposed test (see Results), which we have theoretically proven always corrects for structure (see Supplementary Note).

Similar articles

Cited by

References

    1. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(5):356–369. - PubMed
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009;10(4):241–251. - PubMed
    1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. - PMC - PubMed
    1. Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. 1999;65(1):220–228. - PMC - PubMed
    1. Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Stat Sci. 2009;24:451–471.

Publication types

Associated data

LinkOut - more resources