Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 19:14:132.
doi: 10.1186/1471-2105-14-132.

Robust methods for population stratification in genome wide association studies

Affiliations

Robust methods for population stratification in genome wide association studies

Li Liu et al. BMC Bioinformatics. .

Abstract

Background: Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. Population stratification can cause spurious associations if not adjusted properly. The principal component analysis (PCA) method has been relied upon as a highly useful methodology to adjust for population stratification in these types of large-scale studies. Recently, the linear mixed model (LMM) has also been proposed to account for family structure or cryptic relatedness. However, neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers.

Results: We propose to use robust PCA combined with k-medoids clustering to deal with population stratification. This approach can adjust for population stratification for both continuous and discrete populations with subject outliers, and it can be considered as an extension of the PCA method and the multidimensional scaling (MDS) method. Through simulation studies, we compare the performance of our proposed methods with several widely used stratification methods, including PCA and MDS. We show that subject outliers can greatly influence the analysis results from several existing methods, while our proposed robust population stratification methods perform very well for both discrete and admixed populations with subject outliers. We illustrate the new method using data from a rheumatoid arthritis study.

Conclusions: We demonstrate that subject outliers can greatly influence the analysis result in GWA studies, and propose robust methods for dealing with population stratification that outperform existing population stratification methods in the presence of subject outliers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The orthogonal distance versus the score distance for one simulated dataset. The plot is based on projection pursuit robust PCA using the GRID algorithm for one simulated dataset under scenario S4 in simulation II. The vertical line is the outlier cutoff line for the score distance, the horizontal line is the outlier cutoff for the orthogonal distance, and those points on the right of the vertical line or above the horizontal line were identified as outliers.
Figure 2
Figure 2
The orthogonal distance versus the score distance for NARAC data. The vertical line is the outlier cutoff line for the score distance, the horizontal line is the outlier cutoff for the orthogonal distance, and those points on the right of the vertical line or above the horizontal line were identified as outliers.
Figure 3
Figure 3
Results of GWA analyses based on five different methods. The y axis is in square root scale to improve readability.

Similar articles

Cited by

References

    1. Meng J, Rosenwasser LJ. Unraveling the Genetic Basis of Asthma and Allergic Diseases. Allergy Asthma Immunol Res. 2010;2(4):215–227. doi: 10.4168/aair.2010.2.4.215. - DOI - PMC - PubMed
    1. Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high density oligonucleotide SNP array data. Biostatistics. 2007;8:485–499. - PubMed
    1. Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, Clark TG. A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics. 2007;23:2741–2746. doi: 10.1093/bioinformatics/btm443. - DOI - PMC - PubMed
    1. Balding D. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–791. doi: 10.1038/nrg1916. - DOI - PubMed
    1. Gordon D, Finch SJ. Factors affecting statistical power in the detection of genetic association. J Clin Invest. 2005;115:1408–1418. doi: 10.1172/JCI24756. - DOI - PMC - PubMed

Publication types