. 2014 Nov 13;10(11):e1004787.

doi: 10.1371/journal.pgen.1004787. eCollection 2014 Nov.

GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation

Dongjun Chung¹, Can Yang², Cong Li³, Joel Gelernter⁴, Hongyu Zhao⁵

Affiliations

¹ Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, United States of America.
² Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America; Department of Mathematics, Hong Kong Baptist University, Hong Kong, China.
³ Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America.
⁴ Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America; VA CT Healthcare Center, West Haven, Connecticut, United States of America; Department of Genetics, Yale School of Medicine, West Haven, Connecticut, United States of America; Department of Neurobiology, Yale School of Medicine, New Haven, Connecticut, United States of America.
⁵ Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America; Department of Genetics, Yale School of Medicine, West Haven, Connecticut, United States of America; VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America.

PMID: 25393678
PMCID: PMC4230845
DOI: 10.1371/journal.pgen.1004787

GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation

Dongjun Chung et al. PLoS Genet. 2014.

. 2014 Nov 13;10(11):e1004787.

doi: 10.1371/journal.pgen.1004787. eCollection 2014 Nov.

Authors

Dongjun Chung¹, Can Yang², Cong Li³, Joel Gelernter⁴, Hongyu Zhao⁵

Affiliations

¹ Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, United States of America.
² Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America; Department of Mathematics, Hong Kong Baptist University, Hong Kong, China.
³ Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America.
⁴ Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America; VA CT Healthcare Center, West Haven, Connecticut, United States of America; Department of Genetics, Yale School of Medicine, West Haven, Connecticut, United States of America; Department of Neurobiology, Yale School of Medicine, New Haven, Connecticut, United States of America.
⁵ Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America; Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America; Department of Genetics, Yale School of Medicine, West Haven, Connecticut, United States of America; VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America.

PMID: 25393678
PMCID: PMC4230845
DOI: 10.1371/journal.pgen.1004787

Abstract

Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by many genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identify risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulating evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is achieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointly analyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by the traditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Using our hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatric disorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression (GTEx) database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODE DNase-seq data from 125 cell lines. GPA was able to detect cell lines that are biologically more relevant to bladder cancer. The R implementation of GPA is currently available at http://dongjunchung.github.io/GPA/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. AUC (left), partial AUC (Middle) and power (right) of GPA for SNP prioritization with sample size = 5000 and number of risk SNPs = 1000.**
The results are based on 200 simulations.

**Figure 2. Global false discovery rates of GPA at sample size = 5000 and number of risk SNPs = 1000.**
Upper panel: Global false discovery rates of GPA with annotation. Lower panel: Global false discovery rates of GPA without annotation. From left to right: FDR of first GWAS (joint analysis), FDR of second GWAS (joint analysis), FDR of first GWAS (separate analysis), FDR of second GWAS (separate analysis) and FDR of risk variants shared by both GWAS. For all scenarios, the global false discovery rates of GPA are controlled at the nominal level.

Figure 3. Comparisons of receiver operating characteristic curves measured by AUCs (Left) and partial AUCs (Right) between GPA and the conditional FDR approach at sample size = 5000 and number of risk SNPs = 1000.
The results are based on 200 simulations.

**Figure 4. The comparison between GPA and GSEA at number of risk SNPs = 1000.**
Here we fixed and varied to evaluate the power for sample size = 2000 (Upper Left panel), 5000 (Upper Right panel), 10000 (Lower Left panel), respectively. We used to evaluate the type I errors (Lower Right panel). The results are based on 500 simulations.

formula image — **Figure 4. The comparison between GPA and GSEA at number of risk SNPs = 1000.**
Here we fixed and varied to evaluate the power for sample size = 2000 (Upper Left panel), 5000 (Upper Right panel), 10000 (Lower Left panel), respectively. We used to evaluate the type I errors (Lower Right panel). The results are based on 500 simulations.

Figure 5. The type I error rate and power of the pleiotropy test. Here we varied to evaluate the power for sample size = 500 (Upper Left panel), 1000 (Upper Right panel), and 2000 (Lower Left panel), respectively.
We used to evaluate the type I errors of the pleiotropy test (Lower Right panel). In each setting, we also varied sample size = 1000, 2000, and 10000. Note that type I error rate and power of the pleiotropy test remain almost the same in presence of annotation (see Figure S9 in Text S1).

**Figure 6. Manhattan plots of BPD and SCZ.**
Top left panel: separate analysis without annotation. Top right panel: separate analysis with CNS annotation. Bottom left panel: joint analysis without annotation. Bottom right panel: joint analysis with CNS annotation. The red and blue lines indicate local = 0.05 and 0.1, respectively.

**Figure 7. Manhattan plots of local false discovery rates and (Equations (11) and (12)) for detecting BPD-SCZ-sharing SNPs.**
Left panel: joint analysis without annotation. Right panel: joint analysis with annotation. The red and blue lines indicate local = 0.05 and 0.1, respectively.

**Figure 8. Enrichment of the DNase I hypersenstivity site annotation data from 125 cell lines for bladder cancer.**
Left panel: of hypothesis testing (13) vs. fold enrichment . The vertical red line corresponds to the significance level ( = 0.05) after Bonferroni correction. The horizontal red line corresponds to ratio = 1. Right panel: The normalized variance component (2) given by LMM v.s. given by GPA.

See this image and copyright information in PMC

References

1. Hindorff L, Sethupathy P, Junkins H, Ramos E, Mehta J, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences 106: 9362. - PMC - PubMed
1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753. - PMC - PubMed
1. Visscher PM, Hill WG, Wray NR (2008) Heritability in the genomics era - concepts and misconceptions. Nature Reviews Genetics 9: 255–266. - PubMed
1. Allen HL, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467: 832–838. - PMC - PubMed
1. Visscher PM (2008) Sizing up human height variation. Nature genetics 40: 489–490. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation

Affiliations

GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases