Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul-Aug;20(4):659-67.
doi: 10.1136/amiajnl-2012-001168. Epub 2012 Sep 11.

An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer

Affiliations

An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer

Zhenshu Wen et al. J Am Med Inform Assoc. 2013 Jul-Aug.

Abstract

Background: Many methods have been developed to identify disease genes and further module biomarkers of complex diseases based on gene expression data. It is generally difficult to distinguish whether the variations in gene expression are causative or merely the effect of a disease. The limitation of relying on gene expression data alone highlights the need to develop new approaches that can explore various data to reflect the casual relationship between network modules and disease traits.

Methods: In this work, we developed a novel network-based approach to identify putative causal module biomarkers of complex diseases by integrating heterogeneous information, for example, epigenomic data, gene expression data, and protein-protein interaction network. We first formulated the identification of modules as a mathematical programming problem, which can be solved efficiently and effectively in an accurate manner. Then, we applied our approach to colorectal cancer (CRC) and identified several network modules that can serve as potential module biomarkers for characterizing CRC. Further validations using three additional gene expression datasets verified their candidate biomarker properties and the effectiveness of the method. Functional enrichment analysis also revealed that the identified modules are strongly related to hallmarks of cancer, and the enriched functions, such as inflammatory response, receptor and signaling pathways, are specific to CRC.

Results: Through constructing a transcription factor (TF)-module network, we found that aberrant DNA methylation of genes encoding TF considerably contributes to the activity change of some genes, which may function as causal genes of CRC, and that can also be exploited to develop efficient therapies or effective drugs.

Conclusion: Our method can potentially be extended to the study of other complex diseases and the multiclassification problem.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic flowchart of our method. First, DNA methylation data, and known cancer genes (KCG) are exploited to define candidate causal genes (candidate causal genes), which are represented by nodes with black dashed border in the modules. Then, weighted protein–protein interaction (PPI) network, obtained by combining gene expression data and protein–protein interaction network, is used to cluster network modules. Furthermore, through ‘indexing’ (map the candidate causal genes to the proteins in the modules) and ‘assignment’ (map the normalized gene expression value of genes to the corresponding proteins), activity matrix of modules is obtained by defining the activity of modules Mij. Through introducing indicative variable xi and designing a classifier, we formulate the identification of causal modules as an integer linear programming problem. Finally, by solving it, we identify module biomarkers, which characterize complex diseases. This figure is only reproduced in colour in the online version.
Figure 2
Figure 2
Overlaps of the predicted genes and other kinds of genes. (A) Overlaps between the predicted genes and differential methylated genes (DMG), collected cancer genes, and colorectal cancer genes, respectively. (B) overlap of differential methylated genes in this dataset and dataset GSE17648 in the identified modules. This figure is only reproduced in colour in the online version.
Figure 3
Figure 3
Dendrogram and heat map based on the identified causal modules. The row labels denote the module IDs, and column label ‘N’ represents normal samples, while ‘CRC’ stands for colorectal cancer samples. Colors represent the activity of the modules. Red indicates high activity, while green means low activity. This figure is only reproduced in colour in the online version.
Figure 4
Figure 4
The constructed transcription factor (TF)-module network. The nodes in the outside cycle represent TF, while the nodes in the inside cycle indicate modules. The grey TF mean that these TF are differentially methylated. This figure is only reproduced in colour in the online version.
Figure 5
Figure 5
Dendrogram and heat map in the independent test dataset GSE24514 based on the identified causal modules. The row labels denote the module IDs, and column ‘N’ represents normal samples, while ‘CRC’ stands for colorectal cancer samples. The accuracy is 89.8% for the level of sensitivity of 96.8% (TP=30, FP=14, FN=1, TN=14). This figure is only reproduced in colour in the online version.
Figure 6
Figure 6
Dendrogram and heat map in the independent test dataset combined from GSE8671 and GSE9348 based on the identified causal modules. The row labels denote the module IDs, and column ‘N’ represents normal samples, while ‘CRC’ stands for colorectal cancer samples. The accuracy is 98.6% for the level of sensitivity of 100% (TP=100, FP=2, FN=0, TN=44). This figure is only reproduced in colour in the online version.

Similar articles

Cited by

References

    1. Hunter DJ. Gene–environment interactions in human diseases. Nat Rev Genet 2005;6:287–98 - PubMed
    1. Schadt EE. Molecular networks as sensors and drivers of common human diseases. Nature 2009;461:218–23 - PubMed
    1. Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000;403:503–11 - PubMed
    1. Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286:531–7 - PubMed
    1. Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet 2004;5:101–13 - PubMed

Publication types