Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 18;18(1):41.
doi: 10.1186/s12859-017-1477-3.

SMITE: an R/Bioconductor package that identifies network modules by integrating genomic and epigenomic information

Affiliations

SMITE: an R/Bioconductor package that identifies network modules by integrating genomic and epigenomic information

N Ari Wijetunga et al. BMC Bioinformatics. .

Abstract

Background: The molecular assays that test gene expression, transcriptional, and epigenetic regulation are increasingly diverse and numerous. The information generated by each type of assay individually gives an insight into the state of the cells tested. What should be possible is to add the information derived from separate, complementary assays to gain higher-confidence insights into cellular states. At present, the analysis of multi-dimensional, massive genome-wide data requires an initial pruning step to create manageable subsets of observations that are then used for integration, which decreases the sizes of the intersecting data sets and the potential for biological insights. Our Significance-based Modules Integrating the Transcriptome and Epigenome (SMITE) approach was developed to integrate transcriptional and epigenetic regulatory data without a loss of resolution.

Results: SMITE combines p-values by accounting for the correlation between non-independent values within data sets, allowing genes and gene modules in an interaction network to be assigned significance values. The contribution of each type of genomic data can be weighted, permitting integration of individually under-powered data sets, increasing the overall ability to detect effects within modules of genes. We apply SMITE to a complex genomic data set including the epigenomic and transcriptomic effects of Toxoplasma gondii infection on human host cells and demonstrate that SMITE is able to identify novel subnetworks of dysregulated genes. Additionally, we show that SMITE outperforms Functional Epigenetic Modules (FEM), the current paradigm of using the spin-glass algorithm to integrate gene expression and epigenetic data.

Conclusions: SMITE represents a flexible, scalable tool that allows integration of transcriptional and epigenetic regulatory data from genome-wide assays to boost confidence in finding gene modules reflecting altered cellular states.

Keywords: Bioinformatics; Epigenetic; Gene expression; Genomic; Interaction network; Modules.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Summary of SMITE. The flowchart details the pipeline through which SMITE takes p-values, associates them with genomic intervals, and scores genes. The steps and input required to discover significant modules are shown as well as the downstream functions that SMITE provides for module interpretation
Fig. 2
Fig. 2
Monte Carlo simulation of correlation matrix for DNA methylation. The average Pearson correlations as a function of distance separating adjacent effects for DNA methylation in the T. gondii HFF data set. As expected, there is general decrease in the correlation of DNA methylation values as the distance between assayed sites increases
Fig. 3
Fig. 3
The effect of adjustment by the total number of combined P-values. In this example taken from the T. gondii HFF data set, the negative natural log of the significance of the combined p-value is plotted against the number of p-values that were combined for each value. The increased trend is visible before adjustment (left) and is no longer present after adjustment (right)
Fig. 4
Fig. 4
Normalization of combined p-value scores. The densities of the scores/p-values for the T. gondii HFF data set are plotted using the SMITE functions to compare each of the annotated contexts to determine if normalization is necessary (left). After normalizing the values by logit transformation, rescaling, and back-transformation, the densities of the normalized p-values are shown (right)
Fig. 5
Fig. 5
Epigenetic modifications at promoters compared with gene bodies. Using the SMITE functions, we show a comparison of the component scores (the –ln (p-value) version of the Score) and the effect direction for gene promoters and gene bodies in the T. gondii HFF data set. For DNA methylation (left), there is not a large relationship between scores and directions of scores between promoters and bodies, whereas for DNA hydroxymethylation (right) there is a concordance of loss of hydroxymethylation in promoters and gene bodies
Fig. 6
Fig. 6
SMITE-identified module implicating cell cycle and MAPK pathways. SMITE allows visualization of the relationship between each component score and the overall node score. This functional module is enriched in human genes that regulate cell cycle by altering cell survival and apoptosis consistent with the known property of T. gondii infection of human cells to induce host cell cycle arrest at G2. The module shows MAPK4 as a highly scoring gene (intense red coloring) centered within the network
Fig. 7
Fig. 7
SMITE-identified module implicating chromatin regulation. The module centered around histones and their regulators is plotted in a circular layout in two modes, with (left) and without (right) component score details. We can see that many of these genes were implicated because of their component scores for gene expression and events occurring at enhancers
Fig. 8
Fig. 8
SMITE comparison with FEM. a An Euler diagram showing that no genes were found by all three models: FEM, SMITE-R, and SMITE-F. SMITE-F and SMITE-R overlap much more than either do with FEM. b A comparison of the densities of all scores compared to genes identified within modules by SMITE-F (left), SMITE-R (middle), and FEM (right), indicating that there is a statistically significant enrichment for high scoring genes using SMITE even when using the reduced model

Similar articles

Cited by

References

    1. Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes Dev. 2011;25:534–55. doi: 10.1101/gad.2017311. - DOI - PMC - PubMed
    1. Koestler DC, Jones MJ, Kobor MS. The era of integrative genomics: more data or better methods? Epigenomics. 2014;6:463–7. doi: 10.2217/epi.14.44. - DOI - PMC - PubMed
    1. Feldmann A, Ivanek R, Murr R, Gaidatzis D, Burger L, Schübeler D. Transcription factor occupancy can mediate active turnover of DNA methylation at regulatory regions. PLoS Genet. 2013;9:e1003994. doi: 10.1371/journal.pgen.1003994. - DOI - PMC - PubMed
    1. Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y, Rando OJ, Birney E, Myers RM, Noble WS, Snyder M, Weng Z. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22:1798–812. doi: 10.1101/gr.139105.112. - DOI - PMC - PubMed
    1. Benveniste D, Sonntag HJ, Sanguinetti G, Sproul D. Transcription factor binding predicts histone modifications in human cell lines. Proc Natl Acad Sci U S A. 2014;111:13367–72. doi: 10.1073/pnas.1412081111. - DOI - PMC - PubMed