Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Aug 8;1(1):25.
doi: 10.1038/s41698-017-0029-7. eCollection 2017.

Network-based machine learning and graph theory algorithms for precision oncology

Affiliations
Review

Network-based machine learning and graph theory algorithms for precision oncology

Wei Zhang et al. NPJ Precis Oncol. .

Abstract

Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing financial interests.

Figures

Fig. 1
Fig. 1
Overview of the methods for network-based precision oncology. a The methods for integration of patient genomic data and molecular networks grouped under the three scenarios of data analysis pipelines. b The methods for integration of drug–drug similarities, drug–target relations and target–target relations for drug repositioning, grouped under three algorithmic categories. c Patient genomic profiles describe the genomic landscape of each patient sample. d The patient genomic profiles are integrated with a molecular network, the human protein–protein interaction (PPI) network in the example. e Drug and disease phenotypes are modeled in a network with connections to the target genes in the PPI network. f An example of cancer subnetworks associated with recurrent ovarian cancer. g Resources of biomedical and molecular networks. h List of the TCGA cancer studies
Fig. 2
Fig. 2
Three scenarios for the integration of genomic data with molecular networks. a Model-based integration formulates one unified learning framework regularized by a graph Laplacian. The output of the model is network modules enriched by the selected genomic features and a prediction of treatment outcome/cancer phenotype. b Preprocessing integration consists of the following two steps: The first step detects subnetworks that differentiate the contrasted patient groups by the genomic features; in the second step, the subnetwork features are then fed into a standard learning model to generate predictions. c Post-analysis integration of oncogenic alterations in the network also consists of two steps. The oncogenic alterations are first detected across the patient profiles, and then the altered genes/loci are mapped to the network as seed genes for the module analysis. For each scenario, the objectives of the approach, the inputs and outputs of the network-based analysis models/methods, and the advantages/limitations of each approach are also provided
Fig. 3
Fig. 3
Model-based integration of whole-genomic profiles and a molecular network. a The patient genomic profiles X along with the clinical information: the survival time, two patient subgroups for classification and treatment response of each individual patient are shown. The network S is typically integrated into the genomic profile analysis with a graph Laplacian regularization. The formulas of the graph Laplacian and its regularization are shown below. The graph Laplacian regularization can be rewritten as summation of pairwise smoothness terms that promote smoothness among the connected genomic features in the network. b The network-based linear regression and Cox regression models are illustrated in the figure with the graph Laplacian regularization term added to the original cost functions. c Network-based classification is illustrated by a network-based SVM to classify the samples. d Network-based semi-supervised learning models classify samples and detect disease markers on a bipartite graph. The edges between samples and genomic features are weighted by the genomic profiles, and semi-supervised learning is based on the bipartite graph Laplacian. e Network-based factorization models factorize the genomic profile X into the product of two matrices, U and H, which cluster patient samples and learn the latent features in the genomic profiles
Fig. 4
Fig. 4
Methods for network-based drug repositioning. a Graph connectivity measures consider the local structures of the networks to predict drug–target interactions. This example shows the shortest path from each target node to the query drug (red node) in the graph. b Link prediction models predict the relations between drugs and targets based on the global structures of the known interactions in the networks with matrix completion or random-walk approaches. The known and predicted drug–target interactions are green and red, respectively, in the drug–target relation matrix. c Network-based classification methods first extract the network topological features for all the targets in the networks. For each drug, a classifier can be trained with the known targets of the drug as positive samples and the others as negative samples. The learned classifiers can then be used to predict the new targets in the test set for each drug. d The advantages and disadvantages of the methods in each category are compared
Fig. 5
Fig. 5
Network-based analysis of highly mutated pathways of 31 cancer types in TCGA data. The highly mutated pathways detected by a network-based analysis and b standard enrichment analysis. The pathways of interest in the discussion are highlighted in blue, and the pathways only enriched by network-based analysis are highlighted in red
Fig. 6
Fig. 6
Network-based analysis of patient mutation data in TCGA ovarian cancer. The significantly mutated pathways in each patient detected by a network analysis and b the analysis of the original mutation data without the network. c The survival plot of the three groups detected by the network-based pathway analysis of the TCGA ovarian cancer patients. Derived by standard log-rank test, the p-values for comparing group 2 vs. group 3 and group 1 + group 2 vs. group 3 are both significant. d The survival plot of the groups detected by the analysis of the original mutation data of the TCGA ovarian cancer patients

References

    1. Weinstein JN, et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed
    1. Hudson TJ, et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. - DOI - PMC - PubMed
    1. Krogan NJ, Lippman S, Agard DA, Ashworth A, Ideker T. The cancer cell map initiative: defining the hallmark networks of cancer. Mol. Cell. 2015;58:690–698. doi: 10.1016/j.molcel.2015.05.008. - DOI - PMC - PubMed
    1. Creixell P, et al. Pathway and network analysis of cancer genomes. Nat. Methods. 2015;12:615–621. doi: 10.1038/nmeth.3440. - DOI - PMC - PubMed
    1. Cheng, F., Zhao, J., Fooksa, M. & Zhao, Z. A network-based drug repositioning infrastructure for precision cancer medicine through targeting significantly mutated genes in the human cancer genomes. J. Am. Med. Inform. Assoc23, 681–691 (2016). - PMC - PubMed