Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul;117(1):51-61.
doi: 10.1038/hdy.2016.25. Epub 2016 May 4.

EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

Affiliations

EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

G-B Chen et al. Heredity (Edinb). 2016 Jul.

Abstract

We develop a novel approach to identify regions of the genome underlying population genetic differentiation in any genetic data where the underlying population structure is unknown, or where the interest is assessing divergence along a gradient. By combining the statistical framework for genome-wide association studies (GWASs) with eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterize the structure of genetic data, loci under selection can be identified without a requirement for discrete populations. We show through theory and simulation that our approach can identify regions under selection along gradients of ancestry, and in real data we confirm this by demonstrating LCT to be under selection between HapMap CEU-TSI cohorts, and we then validate this selection signal across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU-TSI cohort and within the POPRES sample, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. Controlling for population stratification is of great importance in any quantitative genetic study and our approach also provides a simple, fast and accurate way of predicting principal components in independent samples. With ever increasing sample sizes across many fields, this approach is likely to be greatly utilized to gain individual-level eigenvectors avoiding the computational challenges associated with conducting singular value decomposition in large data sets. We have developed freely available software, Genetic Analysis Repository (GEAR), to facilitate the application of the methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Manhattan plots for EigenGWAS for top 10 eigenvectors for HapMap. Using Ei as the phenotype, the single-marker association was conducted for nearly 919 133 markers. The left panel illustrates from E1 to E5 and the right panel from E6 to E10. The horizontal lines indicate genome-wide significance after Bonferroni correction.
Figure 2
Figure 2
Linear correlation for the SNP effects estimated using EigenGWAS and BLUP for HapMap3. The x axis represents EigenGWAS estimation for SNP effects, and the y axis represents BLUP estimation for SNP effects. The left panel illustrates from E1 to E5 and the right panel from E6 to E10. As illustrated at top left in each plot, the correlation, measured in R2, is nearly 1.
Figure 3
Figure 3
The correlation between Fst and formula image for EigenGWAS SNP effects for POPRES. For each eigenvector, upon Ei >0 or Ei ⩽0, POPRES samples were split into two groups, upon which Fst was calculated for each locus. The correlation, at top left in each plot, was measured in R2.
Figure 4
Figure 4
EigenGWAS for CEU (112 samples) and TSI (88 samples) from HapMap. (a) Manhattan plot for EigenGWAS on E1 without correction for λGC. When there was no correction, we found LCT on chromosome 2, MICA on chromosome 6 (HMC region), HIF1A on chromosome 14 and HERC2 on chromosome 15. The line in the middle was genome-wide significant level at α=0.05 given multiple correction. (b) Manhattan plot for EigenGWAS on E1 with λGC correction; LCT was still significant, and HERC2 was slightly below whole genome-wide significance level. The genome-wide significance threshold was P-value=5.44e−08 for α=0.05.
Figure 5
Figure 5
EigenGWAS for POPRES samples on eigenvector 1. (a) Manhattan plot for EigenGWAS without correction for λGC. (b) After correction for λGC, we found LCT on chromosome 2, SLC44A4 on chromosome 6 and HERC2 on chromosome 15. The genome-wide significance level was P-value=7.76e−08 given α=0.05.
Figure 6
Figure 6
Prediction accuracy of the projected eigenvectors for POPRES samples. Given 2466 POPRES samples, the data were split to 5:95%, 10:90%, 20:80%, 30:70%, 40:60% and 50:50%, as training and test sets. The left columns represent prediction accuracy (R2) using randomly selected numbers (100, 1000, 10 000, 100 000, all) of markers, and the 95% confidence interval were calculated from 30 replication for resampling given number of markers. In contrast, the right columns represent the predicted accuracy for 8 P-value thresholds (1e−6, 1e−5, 1e−4, 1e−3, 1e−2, 1e−1, 0.5 and 1) for EigenGWAS SNPs.
Figure 7
Figure 7
Projected eigenvectors for Puerdo Rican cohort (PUR) and Pakistan cohort (PJL) in 1000 Genomes project. The training set was HapMap3 samples build on 919 133 SNPs. The eigenvectors 1 and 2 were generated on the 907 614 common SNPs. PUR showed an admixture of African and European gene flows, and PJL Asian and European gene flows.

Similar articles

Cited by

References

    1. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F et al. (2010). Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. - PMC - PubMed
    1. Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA et al. (2004). Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74: 1111–1120. - PMC - PubMed
    1. Bryc K, Bryc W, Silverstein JW. (2013). Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations. Theor Popul Biol 89: 34–43. - PMC - PubMed
    1. Cavalli-Sforza LL, Menozzi P, Piazza A. (1996) The History and Geography of Human Genes. Princeton University Press.
    1. Chen C-Y, Pollack S, Hunter DJ, Hirschhorn JN, Kraft P, Price AL. (2013). Improved ancestry inference using weights from external reference panels. Bioinformatics 29: 1399–1406. - PMC - PubMed

Publication types

LinkOut - more resources