Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13;38(10):2855-2862.
doi: 10.1093/bioinformatics/btac183.

Network-based cancer heterogeneity analysis incorporating multi-view of prior information

Affiliations

Network-based cancer heterogeneity analysis incorporating multi-view of prior information

Yang Li et al. Bioinformatics. .

Abstract

Motivation: Cancer genetic heterogeneity analysis has critical implications for tumour classification, response to therapy and choice of biomarkers to guide personalized cancer medicine. However, existing heterogeneity analysis based solely on molecular profiling data usually suffers from a lack of information and has limited effectiveness. Many biomedical and life sciences databases have accumulated a substantial volume of meaningful biological information. They can provide additional information beyond molecular profiling data, yet pose challenges arising from potential noise and uncertainty.

Results: In this study, we aim to develop a more effective heterogeneity analysis method with the help of prior information. A network-based penalization technique is proposed to innovatively incorporate a multi-view of prior information from multiple databases, which accommodates heterogeneity attributed to both differential genes and gene relationships. To account for the fact that the prior information might not be fully credible, we propose a weighted strategy, where the weight is determined dependent on the data and can ensure that the present model is not excessively disturbed by incorrect information. Simulation and analysis of The Cancer Genome Atlas glioblastoma multiforme data demonstrate the practical applicability of the proposed method.

Availability and implementation: R code implementing the proposed method is available at https://github.com/mengyunwu2020/PECM. The data that support the findings in this paper are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Simulation: marker selection and heterogeneity analysis performance of the proposed method as a function of η based on 100 replications. First column: prior information is fully credible; second column: prior information is partially credible (50%); and third column: prior information is completely inaccurate. The three rows represent the results under the scenarios with K =2, 3 and 4, respectively. The shaded area represents the selected range of η by mBIC over 100 replications
Fig. 2.
Fig. 2.
Data analysis: gene networks for the five subtypes selected by the proposed method. The dashed edges represent the common gene relationships shared by the five subtypes and solid edges represent the specific gene relationships of each subtype

References

    1. Arias-Castro E., Pu X. (2017) A simple approach to sparse clustering. Comput. Stat. Data Anal., 105, 217–228.
    1. Bouveyron C., Brunet-Saumard C. (2014) Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal., 71, 52–78.
    1. Chang X. et al. (2018) Sparse k-means with l/l0 penalty for high-dimensional data clustering. Stat. Sin., 28, 1265–1284.
    1. Cowen L. et al. (2017) Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet., 18, 551–562. - PubMed
    1. Degenhardt F. et al. (2019) Evaluation of variable selection methods for random forests and omics data sets. Brief. Bioinform., 20, 492–503. - PMC - PubMed

Publication types