Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012:850:399-409.
doi: 10.1007/978-1-61779-555-8_21.

Allowing for population stratification in association analysis

Affiliations

Allowing for population stratification in association analysis

Huaizhen Qin et al. Methods Mol Biol. 2012.

Abstract

In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false-positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PC) to represent local ancestries and adjusting for local PCs when testing for genotype-phenotype association. With an acceptable computation burden, the proposed algorithm successfully eliminates the known spurious association between SNPs in the LCT gene and height due to the population structure in European Americans.

PubMed Disclaimer

Figures

Fig 1
Fig 1
A typical window consists of a 4-Mb core and an envelope with 8-Mb margins on each side of the core. The first ℓ PCs of the genotypic score matrix of the SNPs in the 20-Mb window are employed to adjust for local ancestries of the SNPs within the 4-Mb core.
Fig 2
Fig 2
The distributions of the λ2-values of the local windows in three GWAS data sets. For each window in a given genotype data set, λ2 is the largest squared coefficient of canonical correlation between the first 10 local PCs and the first 10 global PCs. Relatively, the Maywood participants demonstrate more population structure, the Nigerian samples demonstrate little population structure, whereas the Framingham participants demonstrate a much more complex local population structure than do the other two samples.
Fig 3
Fig 3
PCs with (a) and without (b) normalizing the GAW17 genotypic data set of 697 unrelated individuals. Clearly, the PCs without normalization provide better discrimination, although both PCs roughly classify the 697 individuals into 3 large groups: CEPI and Tuscan, Luhya and Yoruba, as well as Denver Chinese, Han Chinese, and Japanese. The PCs without normalization appear more robust to outliers.
Fig 4
Fig 4
Pearson correlation coefficients between the standard first global PC coordinates of distinct subsets of 1,969,739 SNPs and individual global ancestries of the 2,000 individuals. The data set is generated by the GenoAnceBase0 program (13) applied to the CEU and YRI haplotypes of the HapMap data (Phase II) to simulate African-American genomes. The standard first global PC coordinates of the 3,029 unlinked AIMs across the genome are highly correlated with true individual global ancestries. The first standard global PC coordinates using more random markers represent the true global ancestries even better, regardless of there being more abundant LD.

Similar articles

Cited by

References

    1. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet. 1988;43:520–526. - PMC - PubMed
    1. Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
    1. Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. - PubMed
    1. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet. 2000;67:170–181. - PMC - PubMed
    1. Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case–control studies of genetic association using a novel latent-class model. Am J Hum Genet. 2001;68:466–477. - PMC - PubMed

Publication types

LinkOut - more resources