. 2022 Jun 24;23(13):7057.

doi: 10.3390/ijms23137057.

Computational Analysis Identifies Novel Biomarkers for High-Risk Bladder Cancer Patients

Radosław Piliszek¹, Anna A Brożyna², Witold R Rudnicki^{1

3}

Affiliations

¹ Computational Centre, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland.
² Department of Human Biology, Institute of Biology, Faculty of Biological and Veterinary Sciences, Nicolaus Copernicus University, ul. Lwowska 1, 87-100 Toruń, Poland.
³ Institute of Computer Science, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland.

PMID: 35806060
PMCID: PMC9266725
DOI: 10.3390/ijms23137057

Computational Analysis Identifies Novel Biomarkers for High-Risk Bladder Cancer Patients

Radosław Piliszek et al. Int J Mol Sci. 2022.

. 2022 Jun 24;23(13):7057.

doi: 10.3390/ijms23137057.

Authors

Radosław Piliszek¹, Anna A Brożyna², Witold R Rudnicki^{1

3}

Affiliations

¹ Computational Centre, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland.
² Department of Human Biology, Institute of Biology, Faculty of Biological and Veterinary Sciences, Nicolaus Copernicus University, ul. Lwowska 1, 87-100 Toruń, Poland.
³ Institute of Computer Science, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland.

PMID: 35806060
PMCID: PMC9266725
DOI: 10.3390/ijms23137057

Abstract

In the case of bladder cancer, carcinoma in situ (CIS) is known to have poor diagnosis. However, there are not enough studies that examine the biomarkers relevant to CIS development. Omics experiments generate data with tens of thousands of descriptive variables, e.g., gene expression levels. Often, many of these descriptive variables are identified as somehow relevant, resulting in hundreds or thousands of relevant variables for building models or for further data analysis. We analyze one such dataset describing patients with bladder cancer, mostly non-muscle-invasive (NMIBC), and propose a novel approach to feature selection. This approach returns high-quality features for prediction and yet allows interpretability as well as a certain level of insight into the analyzed data. As a result, we obtain a small set of seven of the most-useful biomarkers for diagnostics. They can also be used to build tests that avoid the costly and time-consuming existing methods. We summarize the current biological knowledge of the chosen biomarkers and contrast it with our findings.

Keywords: biomarker identification; carcinoma in situ (CIS); nonmuscle-invasive bladder cancer (NMIBC); optimal feature set selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Plots of the area under the receiver operating characteristic curve (AUC) of the Random Forest classifiers, using markers selected by the top-n approach and two variants of hierarchical clustering inside our proposed protocol (complete linkage and Ward’s criterion). These results were obtained without external cross validation (CV) or resampling, but from inside of the protocol itself (that is, under the protocol’s internal CV). Error bars denote the standard error.

**Figure 2**
Depiction of a single run of the base protocol (A) and the proposed protocol (B). In (A), the clustering is used directly to obtain markers and build models on them. In (B), the (A) part is replicated, except for the highlighted part regarding model building and evaluation. Instead, the results of (A) are used to build a ranking of the most-commonly chosen variables, which then are used for model building and evaluation.

**Figure 3**
Depiction of the final evaluation. Both variants (A,B) from Figure 2 were evaluated, respectively. An external cross validation was used to obtain the mean and standard deviation of quality metrics (AUC, odds ratio). The evaluation of the stability of variant (A) (the base protocol) prompted us to create and apply variant (B) (the proposed protocol).

**Figure 4**
Plots of area under the receiver operating characteristic curve (AUC) of Random Forest classifiers using markers selected by the top-n approach and two variants of hierarchical clustering inside our proposed protocol (complete linkage and Ward’s criterion). These results were obtained in external 10-fold cross validation (CV). Internally, for the protocol, 10-fold CV was used to ensure enough samples. Error bars denote the standard error. The complete linkage variant exhibits the desired behavior, achieving the best results earliest, with a plateau starting at 7.

**Figure 5**
Most-representative markers at different clustering levels in 150 repeats of hierarchical clustering procedure. The first 3 columns show the order in which markers are included in the representative set, when the number of representatives is increased by 1—from 2 to 15. The Ensemble code of each marker, with 5 leading zeros removed, is shown in column 2, and the gene name corresponding to the marker is shown in column 3. In the remaining columns, the markers that are most often selected as representatives in 150 repeats are shown, and their positions within the column corresponds to the frequency of selection of a given marker as the representative (higher position—higher frequency).

**Figure 6**
Plots of the area under the receiver operating characteristic curve (AUC) of Random Forest classifiers, using markers selected by the top-n approach and two variants of hierarchical clustering inside our proposed protocol (complete linkage and Ward’s criterion). These results were obtained in 100 runs of resampling of the standard protocol, as described in the paper body. Error bars denote the standard error. The complete linkage variant again exhibits the desired behavior, achieving the best results earliest, with the plateau starting at 7.

**Figure 7**
Heatmap of the correlation square of the chosen genes’ expression levels. The darker (more saturated) the square, the higher the level of correlation.

**Figure 8**
Boxplots of expression levels of selected markers comparing samples with CIS in disease course and those without it. Values of “DPY19L3-DT” are presented after applying a logarithm operation to be able to show the difference. Others are plotted verbatim. It can be seen that low expression levels of ADAM28 and TMEM232 increase the risk of CIS in disease course, while the 5 other variables exhibit the inverse behavior.

See this image and copyright information in PMC

References

1. Saginala K., Barsouk A., Aluru J.S., Rawla P., Padala S.A., Barsouk A. Epidemiology of bladder cancer. Med. Sci. 2020;8:15. doi: 10.3390/medsci8010015. - DOI - PMC - PubMed
1. Knowles M.A., Hurst C.D. Molecular biology of bladder cancer: New insights into pathogenesis and clinical diversity. Nat. Rev. Cancer. 2015;15:25–41. doi: 10.1038/nrc3817. - DOI - PubMed
1. Chen J., Zhang H., Sun G., Zhang X., Zhao J., Liu J., Shen P., Shi M., Zeng H. Comparison of the prognosis of primary and progressive muscle-invasive bladder cancer after radical cystectomy: A systematic review and meta-analysis. Int. J. Surg. 2018;52:214–220. doi: 10.1016/j.ijsu.2018.02.049. - DOI - PubMed
1. Patel V.G., Oh W.K., Galsky M.D. Treatment of muscle-invasive and advanced bladder cancer in 2020. CA Cancer J. Clin. 2020;70:404–423. doi: 10.3322/caac.21631. - DOI - PubMed
1. Kaufman D.S., Shipley W.U., Feldman A.S. Bladder cancer. Lancet. 2009;374:239–249. doi: 10.1016/S0140-6736(09)60491-8. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational Analysis Identifies Novel Biomarkers for High-Risk Bladder Cancer Patients

Affiliations

Computational Analysis Identifies Novel Biomarkers for High-Risk Bladder Cancer Patients

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Research Materials