Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep;20(3):2420-2434.
doi: 10.3892/ol.2020.11780. Epub 2020 Jun 26.

An optimal prognostic model based on gene expression for clear cell renal cell carcinoma

Affiliations

An optimal prognostic model based on gene expression for clear cell renal cell carcinoma

Dan Xu et al. Oncol Lett. 2020 Sep.

Abstract

Clear cell renal cell carcinoma (ccRCC) is the most prevalent type of RCC; however, prognostic prediction tools for ccRCC are scant. Developing mRNA or long non-coding RNA (lncRNA)-based risk assessment tools may improve the prognosis in patients with ccRCC. RNA-sequencing and prognostic data from patients with ccRCC were downloaded from The Cancer Genome Atlas and the European Bioinformatics Institute Array database at the National Center for Biotechnology Information. Differentially expressed (DE) RNAs (DERs) and prognostic DERs were screened between less favorable and favorable prognoses using the limma package in R 3.4.1, and analyzed using univariate and multivariate Cox regression analyses, respectively. Risk score models were constructed using optimal combinations of DEmRNAs and DElncRNAs identified using the Least Absolute Shrinkage And Selection Operator Cox regression model of the penalized package. Associations between risk score models and overall survival time were evaluated. Independent prognostic clinical factors were screened using univariate and multivariate Cox regression analyses, and nomogram models were constructed. Gene Ontology biological processes and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were conducted using the clusterProfiler package in R3.4.1. A total of 451 DERs were identified, including 404 mRNAs and 47 lncRNAs, between less favorable and favorable prognoses, and 269 DERs, including 233 mRNAs and 36 lncRNAs, were identified as independent prognostic factors. Optimal combinations including 10 DEmRNAs or 10 DElncRNAs were screened using four risk score models based on the status or expression levels of the 10 DEmRNAs or 10 DElncRNAs. The model based on the expression levels of the 10 DEmRNAs had the highest prognostic power. These prognostic DEmRNAs may be involved in biological processes associated with the inflammatory response, complement and coagulation cascades and neuroactive ligand-receptor interaction pathways. The present validated risk assessment tool based on the expression levels of these 10 DEmRNAs may help to identify patients with ccRCC at a high risk of mortality. These 10 DEmRNAs in optimal combinations may serve as prognostic biomarkers and help to elucidate the pathogenesis of ccRCC.

Keywords: DEGs; lncRNAs; pathway enrichment analysis; prognostic model; risk score.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Identification and clustering of DERs. (A) Left panel presents the volcano map of DERs between less favorable and favorable prognoses. Pink dots represent DERs. Black dots represent non-DERs.Red horizontal and two vertical dashed lines represent FDR <0.05 and log2FC >0.5, respectively. Right panel presents the composition of DERs with the types and ratios on the horizontal and vertical axes, respectively. Blue and pink columns represent proportions of down- and upregulated RNAs, respectively. (B) Two-way hierarchical clustering heatmap based on the expression levels of DERs. The black and white bars represent less favorable and favorable prognostic groups, respectively. The color key (green to red) exhibits z-score of normalized and log2 transformed expression values of DEGs. The Z-score represents the number of median absolute deviation away from the median. DERs, differentially expressed RNAs; lncRNA, long non-coding RNA; FDR, false discovery rate; FC, fold change.
Figure 2.
Figure 2.
Screening curves of lambda parameters and distribution graphs of coefficients of the optimal combination of (A) lncRNAs and (B) mRNAs via the Cox-PH model based on the L1-penalized regularized regression algorithm. Horizontal and vertical axes in upper graphs indicate lambda and cvl values, respectively. prof stands for profL1 function, and $ indicates the absolute reference. Intersection of red dotted lines indicate the value of lambda when cvl is maximal. When maximal cvl values were −491.8333 and −490.4969, lambda values were 17.3155 for mRNA and 65.3960 for lncRNA, respectively. lncRNA, long non-coding RNA; cvl, cross-validation likelihood.
Figure 3.
Figure 3.
Kaplan-Meier overall survival time and ROC curves of risk score models based on the status of (A) 10 lncRNAs and (B) 10 mRNAs in the training, validation, entire and EBI-validation sets. Green/blue and red/purple curves represent low and high risk groups, respectively. In the ROC curves, the black, red, green and blue lines indicate the training, validation, entire and EBI-validation sets, respectively. lncRNA, long non-coding RNA; EBI, European Bioinformatics Institute; ROC, receiver operating characteristic; AUC, area under the curve; HR, hazard ratio.
Figure 4.
Figure 4.
Kaplan-Meier curves for overall survival time and ROC analysis of risk score models based on the expression levels of (A) 10 lncRNAs and (B) 10 mRNAs in the training, validation, entire and EBI-validation sets. Green/blue and red/purple curves represent the low and high risk groups, respectively. In the ROC curves, the black, red, green and blue lines indicate the training, validation, entire and EBI-validation sets, respectively. lncRNA, long non-coding RNA; EBI, European Bioinformatics Institute; ROC, receiver operating characteristic; AUC, area under the curve; HR, hazard ratio; Exprs, expression levels.
Figure 5.
Figure 5.
Kaplan-Meier overall survival curves for training (left), validation (middle) and entire (right) sets by (A) age and (B) pathologic stage. (A) Black and red curves indicate <60 and ≥60 years, respectively. (B) Black, red, blue and purple curves represent stages I, II, III and IV, respectively. HR, hazard ratio.
Figure 6.
Figure 6.
Nomogram of independent prognostic factors and mRNA expression risk scores, and calibration plots for predicting 3- and 5-year survival probabilities. (A) Nomogram of independent prognostic factors and mRNA expression risk scores. Points for each variable (age, pathological stage and mRNA expression risk score) were determined in the nomogram by drawing a vertical line from the values of each variable to the ‘points’ line. Summed points for all variables were plotted on the ‘Total Points’ line and a vertical line was drawn to read the corresponding 3- and 5-year survival probabilities. (B) Calibration plots for predicting 3- and 5-year survival probabilities. Horizontal and vertical axes indicate predicted and actual 3- and 5-year probabilities of overall survival time, respectively. Red and black lines indicate predicted 3- and 5-year probabilities of overall survival time, respectively. Round points on lines represent the average survival probability at corresponding time points with upper and lower bars indicating upper and lower standard deviations. Grey line represents ideal agreement between predicted and actual probabilities of overall survival time. exprs, expression; RS status, risk score status.
Figure 7.
Figure 7.
Volcano plot and expression heatmap of DEGs between high and low risk in the entire set. (A) Volcano plot of log2FC vs. -log10FDR. Pink and black dots represent significant and non-significant DEGs, respectively. Two vertical dashed lines indicate log2FC 0.263; horizontal dashed line indicates FDR =0.05. (B) Expression heatmap of DEGs with high or low risk scores. Colored bar (green to red) on right margin indicates z-score of normalized and log2 transformed expression values of DEGs. The Z-score represents for the number of median absolute deviations away from the median. DEG, differentially expressed genes; FDR, false discovery rate; FC, fold change.

References

    1. Rini BI, Campbell SC, Escudier B. Renal cell carcinoma. Lancet. 2009;373:1119–1132. doi: 10.1016/S0140-6736(09)60229-4. - DOI - PubMed
    1. Cancer Genome Atlas Research Network, corp-author. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499:43–49. doi: 10.1038/nature12222. - DOI - PMC - PubMed
    1. Valera VA, Merino MJ. Misdiagnosis of clear cell renal cell carcinoma. Nat Rev Urol. 2011;8:321–333. doi: 10.1038/nrurol.2011.64. - DOI - PubMed
    1. Aref S, Al Khodary T, Zeed TA, El Sadiek A, El Menshawy N, Al Ashery R. The prognostic relevance of BAALC and ERG expression levels in cytogenetically normal pediatric acute myeloid leukemia. Indian J Hematol Blood Transfus. 2015;31:21–28. doi: 10.1007/s12288-014-0395-z. - DOI - PMC - PubMed
    1. Li L, Feng T, Qu J, Feng N, Wang Y, Ma RN, Li X, Zheng ZJ, Yu H, Qian B. LncRNA expression signature in prediction of the prognosis of lung adenocarcinoma. Genet Test Mol Biomarkers. 2018;22:20–28. doi: 10.1089/gtmb.2017.0194. - DOI - PubMed

LinkOut - more resources