Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Feb;82(2):352-65.
doi: 10.1016/j.ajhg.2007.10.009.

A unified association analysis approach for family and unrelated samples correcting for stratification

Affiliations

A unified association analysis approach for family and unrelated samples correcting for stratification

Xiaofeng Zhu et al. Am J Hum Genet. 2008 Feb.

Abstract

There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Plot of the First Two Principal Components When Samples Were Generated in Simulation 1, Where Samples Were Drawn from Two Discrete Populations 200, 400, and 800 informative SNPs obtained from Smith et al. were generated with no LD between SNPs in two subpopulations. Left and right dots represent individuals from African and European populations, respectively. The children's principal components were calculated by projection to the axes obtained from the independent samples. It can be observed that the first principal component can distinguish individuals from two subpopulations for both independent samples and children.
Figure 2
Figure 2
Plot of the First Two Principal Components When Samples Were Generated in Simulation 2, Where Samples Were Drawn from an Admixed Population of Two Ancestral Populations 200, 400, and 800 informative SNPs obtained from Smith et al. were generated with no LD between SNPs in two ancestral populations. Blue and red colors indicate that an individual has more African and European ancestral alleles, respectively. The children's principal components were calculated by projection to the axes obtained from the independent samples. Because each individual carries a portion of SNPs from each ancestral population, we cannot observe clean clusters as in Figure 1.
Figure 3
Figure 3
Plot of the First Two Principal Components against the True Ancestry for the Same Data as in Figure 2 We observe that the first principal component, but not the second, is highly correlated with the true ancestry.
Figure 4
Figure 4
The First Three Principal Components for Data from Simulation 3 Plot of the first three principal components when samples were generated in Simulation 3, where samples were drawn from three discrete populations simulated with the data on chromosome 22 of YRI, CEU, and Japanese and Chinese (JCH) from the HapMap project. 10,000 randomly selected SNPs were generated and the LD between SNPs was preserved as in the HapMap data. The children's principal components were calculated by projection on to the axes obtained from the independent samples. Red, green, and blue represent individuals who were from CEU, JCH, and YRI, respectively. It can be observed that the first two principal components can distinguish individuals from three subpopulations for both independent samples and children. (A) Independent samples. (B) Children samples.
Figure 5
Figure 5
The First Three Principal Components for Data from Simulation 4 Plot of the first three principal components when samples were generated in Simulation 4, where samples were drawn from an admixed population simulated with the data on chromosome 22 of YRI, CEU, and Japanese and Chinese (JCH) from the HapMap project. The individual true ancestry is also presented. 10,000 randomly selected SNPs were generated and the LD between SNPs was preserved as in HapMap data. The children's principal components were calculated by projection on to the axes obtained from the independent samples. Because each individual carries a portion of SNPs from each ancestral population, we can not observe distinct clusters as in Figure 4. Color designates an individual's ancestral proportion, as seen in the right panel. (A) Three principal components of independent individuals. (B) True independent individual ancestry. (C) Three principal components of children. (D) True ancestry of children.

References

    1. Risch N., Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. - PubMed
    1. Risch N.J. Searching for genetic determinants in the new millennium. Nature. 2000;405:847–856. - PubMed
    1. Knowler W.C., Williams R.C., Pettitt D.J., Steinberg A.G. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 1988;43:520–526. - PMC - PubMed
    1. Lander E.S., Schork N.J. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
    1. Marchini J., Cardon L.R., Phillips M.S., Donnelly P. The effects of human population structure on large genetic association studies. Nat. Genet. 2004;36:512–517. - PubMed

Publication types

Substances