A unified association analysis approach for family and unrelated samples correcting for stratification

Xiaofeng Zhu¹, Shengchao Li, Richard S Cooper, Robert C Elston

Affiliations

PMID: 18252216
PMCID: PMC2427300
DOI: 10.1016/j.ajhg.2007.10.009

A unified association analysis approach for family and unrelated samples correcting for stratification

Xiaofeng Zhu et al. Am J Hum Genet. 2008 Feb.

. 2008 Feb;82(2):352-65.

doi: 10.1016/j.ajhg.2007.10.009.

Authors

Xiaofeng Zhu¹, Shengchao Li, Richard S Cooper, Robert C Elston

Affiliation

¹ Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA. xzhu1@darwin.case.edu

PMID: 18252216
PMCID: PMC2427300
DOI: 10.1016/j.ajhg.2007.10.009

Abstract

There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.

PubMed Disclaimer

Figures

**Figure 1**
Plot of the First Two Principal Components When Samples Were Generated in Simulation 1, Where Samples Were Drawn from Two Discrete Populations 200, 400, and 800 informative SNPs obtained from Smith et al. were generated with no LD between SNPs in two subpopulations. Left and right dots represent individuals from African and European populations, respectively. The children's principal components were calculated by projection to the axes obtained from the independent samples. It can be observed that the first principal component can distinguish individuals from two subpopulations for both independent samples and children.

**Figure 2**
Plot of the First Two Principal Components When Samples Were Generated in Simulation 2, Where Samples Were Drawn from an Admixed Population of Two Ancestral Populations 200, 400, and 800 informative SNPs obtained from Smith et al. were generated with no LD between SNPs in two ancestral populations. Blue and red colors indicate that an individual has more African and European ancestral alleles, respectively. The children's principal components were calculated by projection to the axes obtained from the independent samples. Because each individual carries a portion of SNPs from each ancestral population, we cannot observe clean clusters as in Figure 1.

**Figure 3**
Plot of the First Two Principal Components against the True Ancestry for the Same Data as in Figure 2 We observe that the first principal component, but not the second, is highly correlated with the true ancestry.

**Figure 4**
The First Three Principal Components for Data from Simulation 3 Plot of the first three principal components when samples were generated in Simulation 3, where samples were drawn from three discrete populations simulated with the data on chromosome 22 of YRI, CEU, and Japanese and Chinese (JCH) from the HapMap project. 10,000 randomly selected SNPs were generated and the LD between SNPs was preserved as in the HapMap data. The children's principal components were calculated by projection on to the axes obtained from the independent samples. Red, green, and blue represent individuals who were from CEU, JCH, and YRI, respectively. It can be observed that the first two principal components can distinguish individuals from three subpopulations for both independent samples and children. (A) Independent samples. (B) Children samples.

**Figure 5**
The First Three Principal Components for Data from Simulation 4 Plot of the first three principal components when samples were generated in Simulation 4, where samples were drawn from an admixed population simulated with the data on chromosome 22 of YRI, CEU, and Japanese and Chinese (JCH) from the HapMap project. The individual true ancestry is also presented. 10,000 randomly selected SNPs were generated and the LD between SNPs was preserved as in HapMap data. The children's principal components were calculated by projection on to the axes obtained from the independent samples. Because each individual carries a portion of SNPs from each ancestral population, we can not observe distinct clusters as in Figure 4. Color designates an individual's ancestral proportion, as seen in the right panel. (A) Three principal components of independent individuals. (B) True independent individual ancestry. (C) Three principal components of children. (D) True ancestry of children.

See this image and copyright information in PMC

References

1. Risch N., Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. - PubMed
1. Risch N.J. Searching for genetic determinants in the new millennium. Nature. 2000;405:847–856. - PubMed
1. Knowler W.C., Williams R.C., Pettitt D.J., Steinberg A.G. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 1988;43:520–526. - PMC - PubMed
1. Lander E.S., Schork N.J. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
1. Marchini J., Cardon L.R., Phillips M.S., Donnelly P. The effects of human population structure on large genetic association studies. Nat. Genet. 2004;36:512–517. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A unified association analysis approach for family and unrelated samples correcting for stratification

Affiliation

A unified association analysis approach for family and unrelated samples correcting for stratification

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical