Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;44(7):702-716.
doi: 10.1002/gepi.22332. Epub 2020 Jun 30.

Evaluation of population stratification adjustment using genome-wide or exonic variants

Affiliations

Evaluation of population stratification adjustment using genome-wide or exonic variants

Yuning Chen et al. Genet Epidemiol. 2020 Oct.

Abstract

Population stratification may cause an inflated type-I error and spurious association when assessing the association between genetic variations with an outcome. Many genetic association studies are now using exonic variants, which captures only 1% of the genome, however, population stratification adjustments have not been evaluated in the context of exonic variants. We compare the performance of two established approaches: principal components analysis (PCA) and mixed-effects models and assess the utility of genome-wide (GW) and exonic variants, by simulation and using a data set from the Framingham Heart Study. Our results illustrate that although the PCs and genetic relationship matrices computed by GW and exonic markers are different, the type-I error rate of association tests for common variants with additive effect appear to be properly controlled in the presence of population stratification. In addition, by considering single nucleotide variants (SNVs) that have different levels of confounding by population stratification, we also compare the power across multiple association approaches to account for population stratification such as PC-based corrections and mixed-effects models. We find that while these two methods achieve a similar power for SNVs that have a low or medium level of confounding by population stratification, mixed-effects model can reach a higher power for SNVs highly confounded by population stratification.

Keywords: GWAS; PCA; mixed-effects model; population stratification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Population structure in 1000G Phase 3 data.
The grouping in EA+AA samples is: group 1 = FIN; group 2 = CEU + GBR; group 3 = IBS + TSI; group 4 = ASW + ACB; group 5 = LWK; group 6 = ESN + GWD + MSI + YRI. The grouping in EA only samples is: group 1 = FIN; group 2 = CEU; group3 = GBR; group 4 = IBS; group 5 = TSI.
Figure 2.
Figure 2.. Pair-wise comparison of kinship coefficients computed using genome-wide and exonic SNVs
Left: kinship coefficients in the combined EA and AA samples. Right: kinship coefficients in EA samples only. Plots below the diagonal are the scatterplots of the kinship coefficients. Plots above the diagonal are the Pearson correlations between them. GW IBS and GW BN represent the IBS and BN kinship matrix with GW markers, Exome IBS and Exome BN are the IBS and BN kinship matrix with exonic markers, Random IBS and Random BN indicate the IBS and BN kinship matrix computed using a randomly selected subset of GW markers that has the same number of markers as the exonic set
Figure 3.
Figure 3.. Relative type-I error rate in simulation studies with binary outcome.
A ratio of observed type-I error rate to expected type-I error rate for various P-value thresholds are presented. A ratio > 1 shows inflation and a ratio < 1 shows deflation. The unadjusted model Y~SNV has relative type-I error rate 16, 645, 5724 and 449000 in the EA + AA analysis and 9, 179, 1101 and 42400 in the EA only analysis at α = 0.05, 1×10−3, 1×10−4 and 1×10−6 respectively.
Figure 4.
Figure 4.. Relative type-I error rate in simulation studies with continuous outcome.
A ratio of observed type-I error rate to expected type-I error rate for various P-value thresholds are presented. A ratio > 1 shows inflation and a ratio < 1 shows deflation. The unadjusted model Y~SNV has relative type-I error rate 13, 383, 2840 and 147000 in the EA + AA analysis and 5, 48, 190 and 3026 in the EA only analysis at α = 0.05, 1×10−3, 1×10−4 and 1×10−6 respectively.
Figure 5.
Figure 5.
Power result from simulations with binary outcome when α = 1×10−3
Figure 6.
Figure 6.
Power result from simulations with quantitative outcome when α = 1×10−4

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, . . . Abecasis GR (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. doi:10.1038/nature15393 [doi] - DOI - PMC - PubMed
    1. Belkadi Aziz, Pedergnana Vincent, Cobat Aurélie, Itan Yuval, Vincent Quentin B., Abhyankar Avinash, . . . Abel Laurent. (2016). Whole-exome sequencing to analyze population structure, parental inbreeding, and familial linkage. Proceedings of the National Academy of Sciences of the United States of America, 113(24), 6713–6718. doi:10.1073/pnas.1606460113 - DOI - PMC - PubMed
    1. Bush WS, & Moore JH (2012). Chapter 11: Genome-wide association studies. PLoS Computational Biology, 8(12), e1002822. doi:10.1371/journal.pcbi.1002822 - DOI - PMC - PubMed
    1. Chen W, Chen H, Wang C, Conomos M, Stilp A, Li Z, . . . Lin X. (2016). Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics, 98(4), 653–666. doi:10.1016/j.ajhg.2016.02.012 - DOI - PMC - PubMed
    1. Collins AR (2007). Linkage disequilibrium and association mapping. Retrieved from 10.1007/978-1-59745-389-9 - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources