Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 15;32(24):3815-3822.
doi: 10.1093/bioinformatics/btw530. Epub 2016 Aug 19.

STAMS: STRING-assisted module search for genome wide association studies and application to autism

Affiliations

STAMS: STRING-assisted module search for genome wide association studies and application to autism

Sara Hillenmeyer et al. Bioinformatics. .

Abstract

Motivation: Analyzing genome wide association data in the context of biological pathways helps us understand how genetic variation influences phenotype and increases power to find associations. However, the utility of pathway-based analysis tools is hampered by undercuration and reliance on a distribution of signal across all of the genes in a pathway. Methods that combine genome wide association results with genetic networks to infer the key phenotype-modulating subnetworks combat these issues, but have primarily been limited to network definitions with yes/no labels for gene-gene interactions. A recent method (EW_dmGWAS) incorporates a biological network with weighted edge probability by requiring a secondary phenotype-specific expression dataset. In this article, we combine an algorithm for weighted-edge module searching and a probabilistic interaction network in order to develop a method, STAMS, for recovering modules of genes with strong associations to the phenotype and probable biologic coherence. Our method builds on EW_dmGWAS but does not require a secondary expression dataset and performs better in six test cases.

Results: We show that our algorithm improves over EW_dmGWAS and standard gene-based analysis by measuring precision and recall of each method on separately identified associations. In the Wellcome Trust Rheumatoid Arthritis study, STAMS-identified modules were more enriched for separately identified associations than EW_dmGWAS (STAMS P-value 3.0 × 10-4; EW_dmGWAS- P-value = 0.8). We demonstrate that the area under the Precision-Recall curve is 5.9 times higher with STAMS than EW_dmGWAS run on the Wellcome Trust Type 1 Diabetes data.

Availability and implementation: STAMS is implemented as an R package and is freely available at https://simtk.org/projects/stams CONTACT: rbaltman@stanford.eduSupplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The workflow overview of STAMS. STAMS overlays GWAS gene-based P-values on a graph of gene–gene interaction confidence scores from the STRING database. In the resulting graph, circular nodes represent genes with size proportional to 1 − (P-value) of the gene’s individual association with the phenotype. Graph edges are edges from the STRING database, weighted with confidence scores calculated by STRING. Using a search based on EW_dmGWAS, STAMS identifies modules of genes that have high biological coherence and an enrichment of GWAS signal
Fig. 2.
Fig. 2.
Performance comparison of STAMS with EW_dmGWAS for six phenotypes. We pooled the genes in the top 1% of modules from each analysis and measured enrichment for genes reported in independent GWAS for each phenotype (knowngenes) using Fisher’s exact test. We plot the −log10 of the P-value so that taller bars indicate better enrichment
Fig. 3.
Fig. 3.
Comparison of precision and recall of STAMS, EW_dmGWAS, and standard gene-based methods. After selecting a number of considered modules in order to approximately match the precision of the standard analysis in RA and T1D, we plot the precision and recall of each method on predicting genes in the knowngenes list. Standard gene-based P-values were calculated using VEGAS and corrected using Bonferroni and FDR
Fig. 4.
Fig. 4.
Precision-Recall curves of STAMS performance across varying numbers of modules and sample sizes. We used the list of genes identified by VEGAS with a Bonferroni correction in the full dataset as a gold standard and calculated precision-recall curves for our ability to recover these genes with a reduced sample size by varying the number of top modules considered. We randomly selected subpopulations of patients to include and show that performance of STAMS with CS edges decreases when fewer patients are included. We also ran EW_dmGWAS with expression data for edge weights, and show its performance. (A) shows results for RA and (B) shows results for T1D
Fig. 5.
Fig. 5.
STAMS performance varies over different edge data sources and disorders. We ran STAMS on the subsets of edges in STRING that were curated in each edge-set modality as summarized in Methods. We pooled the genes from the top 1% of modules returned and measured enrichment for knowngenes with Fisher’s exact test. The −log10 of the P-value of is plotted
Fig. 6.
Fig. 6.
STAMS-identified module from autism GWAS. A high-scoring autism module from AGRE fGWAS is plotted with input gene-based P-values listed in the nodes. Line width corresponds to CS edge confidence, but are all very high confidence (z-scores range from 1.43 to 3.09). The module contains CTTNBP2; rare loss of function mutations in CTTNBP2 have been associated with autism. The other genes in the module are members of the STRIPAK complex. CTTNBP2 interacts with STRIPAK to regulate dendritic spinogensis, a proposed mechanism for autism

References

    1. Ashburner M. et al. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Beissbarth T., Speed T.P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics, 20, 1464–1465. - PubMed
    1. Breitling R. et al. (2004) Graph-based iterative Group Analysis enhances microarray interpretation. BMC Bioinformatics, 5, 100. - PMC - PubMed
    1. Burton P.R. et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678. - PMC - PubMed
    1. Chen Y.K. et al. (2012) CTTNBP2, but not CTTNBP2NL, regulates dendritic spinogenesis and synaptic distribution of the striatin-PP2A complex. Mol. Biol. Cell, 23, 4383–4392. - PMC - PubMed