Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 11;12(Suppl 5):94.
doi: 10.1186/s12920-019-0511-x.

Topological integration of RPPA proteomic data with multi-omics data for survival prediction in breast cancer via pathway activity inference

Affiliations

Topological integration of RPPA proteomic data with multi-omics data for survival prediction in breast cancer via pathway activity inference

Tae Rim Kim et al. BMC Med Genomics. .

Abstract

Background: The analysis of integrated multi-omics data enables the identification of disease-related biomarkers that cannot be identified from a single omics profile. Although protein-level data reflects the cellular status of cancer tissue more directly than gene-level data, past studies have mainly focused on multi-omics integration using gene-level data as opposed to protein-level data. However, the use of protein-level data (such as mass spectrometry) in multi-omics integration has some limitations. For example, the correlation between the characteristics of gene-level data (such as mRNA) and protein-level data is weak, and it is difficult to detect low-abundance signaling proteins that are used to target cancer. The reverse phase protein array (RPPA) is a highly sensitive antibody-based quantification method for signaling proteins. However, the number of protein features in RPPA data is extremely low compared to the number of gene features in gene-level data. In this study, we present a new method for integrating RPPA profiles with RNA-Seq and DNA methylation profiles for survival prediction based on the integrative directed random walk (iDRW) framework proposed in our previous study. In the iDRW framework, each omics profile is merged into a single pathway profile that reflects the topological information of the pathway. In order to address the sparsity of RPPA profiles, we employ the random walk with restart (RWR) approach on the pathway network.

Results: Our model was validated using survival prediction analysis for a breast cancer dataset from The Cancer Genome Atlas. Our proposed model exhibited improved performance compared with other methods that utilize pathway information and also out-performed models that did not include the RPPA data utilized in our study. The risk pathways identified for breast cancer in this study were closely related to well-known breast cancer risk pathways.

Conclusions: Our results indicated that RPPA data is useful for survival prediction for breast cancer patients under our framework. We also observed that iDRW effectively integrates RNA-Seq, DNA methylation, and RPPA profiles, while variation in the composition of the omics data can affect both prediction performance and risk pathway identification. These results suggest that omics data composition is a critical parameter for iDRW.

Keywords: Breast cancer; Integrative analysis; Multi-omics data; Network propagation; Pathway-based analysis; Random walk; Reverse phase protein Array; Survival prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Distribution of RNA-Seq, DNA methylation, RPPA profile, and KEGG pathway data. (a). Venn diagram for genes (or proteins) in RNA-Seq, DNA methylation, RPPA profile, and KEGG pathway. Venn diagram for showing the distribution of logical relation among genes (or proteins) in each profile (Venn diagram was drawn using a tool in this website - http://bioinformatics.psb.ugent.be/webtools/Venn/). (b). Distribution of the ratio of overlapping genes (or proteins) with those genes in each pathway. The frequency over the ratio of overlapping genes (or proteins) in each omics profile with genes in each pathway is shown as histogram and density plot. (c). Venn diagram for the number of samples in each omics profile
Fig. 2
Fig. 2
Overview of the proposed framework
Fig. 3
Fig. 3
Structure of the unified pathway network. A colored node indicates that the gene is included in the corresponding omics profile, and a white node indicates that it is included in a pathway gene set although not in omics profile. A node with a bold borderline represents that the gene appears in both corresponded profile and RNA-Seq profile. In this structure, the inter-relation edges are assigned between the nodes containing the same gene feature in RNA-Seq profile, and the edge direction is set from a proteome network to a transcriptome network or from an epigenome network to a transcriptome network
Fig. 4
Fig. 4
Performance comparison between different methods and profiles. (a). In case of using a single omics profile. DRW(G) used RNA-Seq profile; DRW(M) used DNA methylation profile; DRW(P) used RPPA profile. (b). In case of using reduced RNA-Seq and DNA methylation profile. Each profile was reduced to include genes overlapping with RPPA proteins. iDRW(GRMR) used reduced RNA-Seq and reduced DNA methylation profile; iDRW(GRMRP) used reduced RNA-Seq, reduced DNA methylation, and RPPA profile; iDRWprop(GRMRP) performed network propagation using RWR on the proteome network. (c). Performance comparison of iDRW(GM) and iDRW(GMP). iDRW(GM) is a previous method which used RNA-seq and DNA methylation profile. iDRW(GMP) is our proposed model which used RNA-seq, DNA methylation, and RPPA profile in this study
Fig. 5
Fig. 5
Classification accuracy using different combination of each omics type in pathway activity score calculation. Each case in legend means the combination of omics profiles which was used to calculate the pathway activity score. All cases in this experiment were originating from iDRW(GMP) model (status before pathway activity inference step) with varying γ
Fig. 6
Fig. 6
Performance comparison of pathway-based integration model with varying γ
Fig. 7
Fig. 7
Performance comparison of the pathway-based integration model with optimized γ
Fig. 8
Fig. 8
Risk pathway interaction network from iDRW(GP) and iDRW(GMP). Risk pathways obtained from iDRW(GP) and iDRW(GMP) are shown as blue and orange nodes, respectively, and the common risk pathways in both iDRW(GP) and iDRW(GMP) are shown as yellow nodes. Each edge represents pathway-pathway interaction

Similar articles

Cited by

References

    1. Huang S, Chaudhary K, Garmire LX. More is better: recent progress in multi-omics data integration methods. Front Genet. 2017;8:84. - PMC - PubMed
    1. Joyce AR, Palsson BØ. The model organism as a system: integrating'omics' data sets. Nat Rev Mol Cell Biol. 2006;7(3):198. - PubMed
    1. Lin E, Lane H-Y. Machine learning and systems genomics approaches for multi-omics data. Biomarker research. 2017;5(1):2. - PMC - PubMed
    1. Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC bioinformatics. 2016;17(2):S15. - PMC - PubMed
    1. Kim D, Shin H, Sohn KA, Verma A, Ritchie MD, Kim JH. Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods. 2014;67(3):344–353. - PMC - PubMed

Publication types