Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(4):e33393.
doi: 10.1371/journal.pone.0033393. Epub 2012 Apr 4.

Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network

Affiliations

Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network

Bi-Qing Li et al. PLoS One. 2012.

Abstract

One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. IFS curve for the colorectal tumors and matched normal adjacent tissue samples classification.
In the IFS curve, the X-axis is for the number of probes used for classification, and the Y-axis for the prediction accuracies by the nearest neighbor algorithm (NNA) evaluated by the jackknife (Leave-One-Out) cross-validation test. The peak accuracy was 1 with six probes. The top 6 probes in the mRMR probe list formed the optimal discriminative probe set.
Figure 2
Figure 2. 15 shortest paths between the six genes identified with mRMR method.
The 15 shortest paths between the six candidate genes were identified with Dijkstra's algorithm based on the PPI data from STRING. Yellow roundrect represents the top six candidate genes identified by the mRMR method. Red round represents the 35 genes existing within the range of the shortest paths. Numbers on edges represent the edge weights to quantify the interaction confidence. The smaller the number is, the stronger the interaction between two nodes is. See the text in the Section of “Graph approach and shortest paths tracing” for the quantitative relation of the edge weight with the confidence score between two proteins concerned.

References

    1. Huang T, Chen L, Cai Y-D, Chou K-C. Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One. 2011;6:e25297. - PMC - PubMed
    1. Huang T, Cui W, Hu L, Feng K, Li YX, et al. Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS One. 2009;4:e8126. - PMC - PubMed
    1. Cai YD, Huang T, Feng KY, Hu L, Xie L. A Unified 35-Gene Signature for both Subtype Classification and Survival Prediction in Diffuse Large B-Cell Lymphomas. PLoS One. 2010;5 - PMC - PubMed
    1. Huang T, Cai Y-D, Chen L, Hu L, Kong X-Y, et al. Selection of reprogramming factors of induced pluripotent stem cells based on the protein interaction network and functional profiles. Protein & Peptide Letters 2011 - PubMed
    1. Huang T, Shi XH, Wang P, He Z, Feng KY, et al. Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One. 2010;5:e10972. - PMC - PubMed

Publication types

Substances