Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 21;20(1):1017.
doi: 10.1186/s12885-020-07492-y.

Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling

Affiliations

Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling

Adrián Mosquera Orgueira et al. BMC Cancer. .

Abstract

Background: Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data.

Methods: Gene expression profiling (GEP) of in 2 different publicly available retrospective DLBCL cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel's concordance index (c-index) was used to assess model's predictability. Results were validated in an independent test set.

Results: Two hundred thirty-three and sixty-four patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. This pattern included the following genes: TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and the expression levels of 50 individual genes (training set c-index, 0.8404, test set c-index, 0.7942).

Conclusion: Our results indicate that DLBCL survival models based on the application of machine learning algorithms to gene expression and clinical data can largely outperform other important prognostic variables such as disease stage and COO. Head-to-head comparisons with other risk stratification models are needed to compare its usefulness.

Keywords: DLBCL; Lymphoma; Prediction; Survival; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Kaplan-Meier plots of both 4-gene expression based clusters in the training (a) and test (b) cohorts. The blue line represents patients in the high-risk cluster (cluster 1), and the red line represents the remaining group of patients (cluster 2). Survival probability is represented in the y axis. Time scale (in years) is represented in the x axis
Fig. 2
Fig. 2
Scatterplot matrix representing the distribution of patients according to the expression of TNFRSF9, BIRC3, BCL2L1 and G3BP2. Separate plots are provided for the training (a) and test (b) cohorts. Red dots represent patients in the high-risk cluster (cluster 1), whereas black dots represent the remaining patients (cluster 2)
Fig. 3
Fig. 3
Predicted individual survival curves according to the most accurate random forest model (see text). a) Out-of-bag survival curves predicted for patients within the training cohort (discontinuous black lines). The thick red line represents overall ensemble survival and the thick green line indicates the Nelson-Aalen estimator. b) Individual survival curves predicted for patients within the test cohort (discontinuous black lines). The thick red line represents overall ensemble survival. Time scale is in years

Similar articles

Cited by

References

    1. Teras LR, DeSantis CE, Cerhan JR, Morton LM, Jemal A, Flowers CR. 2016 US lymphoid malignancy statistics by World Health Organization subtypes. CA Cancer J Clin. 2016;66(6):443–459. doi: 10.3322/caac.21357. - DOI - PubMed
    1. Sehn LH, Donaldson J, Chhanabhai M, Fitzgerald C, Gill K, Klasa R, et al. Introduction of combined CHOP plus rituximab therapy dramatically improved outcome of diffuse large B-cell lymphoma in British Columbia. J Clin Oncol. 2005;23(22):5027–5033. doi: 10.1200/JCO.2005.09.137. - DOI - PubMed
    1. Sarkozy C, Sehn LH. Management of relapsed/refractory DLBCL. Best Pract Res Clin Haematol. 2018;31(3):209–216. doi: 10.1016/j.beha.2018.07.014. - DOI - PubMed
    1. Scott DW, King RL, Staiger AM, Ben-Neriah S, Jiang A, Horn H, et al. High grade B-cell lymphoma with MYC and BCL2 and/or BCL6 rearrangements with diffuse large B-cell lymphoma morphology. Blood. 2018;131(18):2060–2064. doi: 10.1182/blood-2017-12-820605. - DOI - PMC - PubMed
    1. Swerdlow SH, Campo E, Pileri SA, Harris NL, Stein H, Siebert R, et al. The 2016 revision of the World Health Organization classification of lymphoid neoplasms. Blood. 2016;127(20):2375–2390. doi: 10.1182/blood-2016-01-643569. - DOI - PMC - PubMed

MeSH terms

Substances