Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases

Matthias Arnold¹, Mara L Hartsperger, Hansjörg Baurecht, Elke Rodríguez, Benedikt Wachinger, Andre Franke, Michael Kabesch, Juliane Winkelmann, Arne Pfeufer, Marcel Romanos, Thomas Illig, Hans-Werner Mewes, Volker Stümpflen, Stephan Weidinger

Affiliations

Affiliation

¹ Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany. matthias.arnold@helmholtz-muenchen.de

PMID: 22988944
PMCID: PMC3782362
DOI: 10.1186/1471-2164-13-490

Meta-Analysis

Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases

Matthias Arnold et al. BMC Genomics. 2012.

. 2012 Sep 18:13:490.

doi: 10.1186/1471-2164-13-490.

Authors

Affiliation

¹ Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany. matthias.arnold@helmholtz-muenchen.de

PMID: 22988944
PMCID: PMC3782362
DOI: 10.1186/1471-2164-13-490

Abstract

Background: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases.

Results: Here, we describe a new approach to systematically prioritize and interpret available GWA data. We focus on the analysis of joint and disjoint genetic determinants across diseases. Using network analysis, we show that variant-based approaches are superior to locus-based analyses. In addition, we provide a prioritization of disease loci based on network properties and discuss the roles of hub loci across several diseases. We demonstrate that, in general, agonistic associations appear to reflect current disease classifications, and present the potential use of effect sizes in refining and revising these agonistic signals. We further identify potential branching points in disease etiologies based on antagonistic variants and describe plausible small-scale models of the underlying molecular switches.

Conclusions: The observation that a surprisingly high fraction (>15%) of the SNPs considered in our study are associated both agonistically and antagonistically with related as well as unrelated disorders indicates that the molecular mechanisms influencing causes and progress of human diseases are in part interrelated. Genetic overlaps between two diseases also suggest the importance of the affected entities in the specific pathogenic pathways and should be investigated further.

PubMed Disclaimer

Figures

**Figure 1**
**Illustration of the different disease networks based on genome-wide association data.**A: The bipartite graph constructed from all association data. The two disjoint node sets are diseases (n = 111) and loci (n = 734; 508 gene loci and 226 intergenic loci), connected to each other by an edge if a variant (n = 1,120) within the respective locus is associated with the corresponding trait. B: The SLN (shared locus network) consisting of 84 traits and 157 loci, retrieved by removing isolated traits and loci that are associated with a single trait only. C: The SVN (shared variant network) that corresponds to a variant-based representation of the data. Here, a trait and a locus are linked if the locus contains a variant comprising associations with this and at least one other trait. The network consists of 175 SNPs located in 94 loci that are associated with 55 diseases (see also Additional file 2: Table S1). The colors of the disease nodes correspond to disease classes according to the MeSH ontology, multi-colored nodes indicate an association with different disease classes; loci are depicted as transparent, diamond-shaped nodes. The node size reflects the number of loci a disease is associated with. In C, the edge color reflects the allelic information: gray indicates agonistic variant(s), red corresponds to antagonistic variant(s), and blue mark both agonistic and antagonistic signals.

**Figure 2**
**LD based locus assignment and its error sources.** At the example of chromosome 8q21.11, LD-based locus assignment is given for 6 exemplary SNPs (blue box). LD information is given by a color scale displaying the LD-measure r² with red depicting strong LD, blue low LD and white no LD. Example SNPs in LD are connected with black dashed lines. In the gray boxes, the two error sources of automated locus assignment are given. An assignment error I occurs if two variants not in LD, i.e. in two independent LD blocks, are located in the same gene, intergenic region or gene desert and thus are assigned the same locus. Here, this is the case for the variants rs-A/rs-B and rs-E/rs-F, respectively. The consequence of this type of error is a shared association on the locus level not mirrored on the variant level. An assignment error II is introduced if two variants are in LD but diverge in their assigned locus. Here, this is the case for rs-C and rs-D. Due to such abnormalities in the LD data the link between both variants is lost if only the locus level is considered.

**Figure 3**
**Clustering of diseases with respect to genetic signals.** We applied complete-linkage hierarchical clustering to identify groups of traits which show homogeneous patterns of genetic overlap to other disorders. We calculated for each pair of diseases the Pearson correlation of the patterns of overlap to the other diseases. The correlation values are ranging from −1 (white) indicating complete negative correlation to +1 (black) reflecting a perfect positive correlation. As the minimal value of the correlation coefficient was > −0.1, we collapsed the range of negative correlation. In red numbers, the 15 disease clusters are denoted. The Euclidian distance threshold was chosen as the maximal distance at which the six diseases showing no or only weak correlation with any other disease (disease names in gray) remain non-clustered.

**Figure 4**
**Data prioritization and analysis workflow.** We established a semi-automated curation pipeline which automatically gathers and annotates GWA data obtained from three sources (locus assignment included). Last step of the preprocessing was the manual inspection of risk alleles and odds ratios. With this data set at hand, we construct a locus-based (SLN) and a variant-based (SVN) network representation of the data. For quality reasons, we then limited analyses to the SVN and investigated the contained variants and their effects further.

See this image and copyright information in PMC

References

1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–9367. - PMC - PubMed
1. Stranger BE, Stahl EA, Raj T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics. 2011;187(2):367–383. - PMC - PubMed
1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A. et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. - PMC - PubMed
1. Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C, Frackleton E, Hou C, Glessner JT, Chiavacci R. et al. From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet. 2009;5(10):e1000678. - PMC - PubMed
1. So HC, Li MX, Sham PC. Uncovering the total heritability explained by All true susceptibility variants in a genome-wide association study. Genet Epidemiol. 2011;35(6):447–456. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases

Affiliation

Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources