Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 13;5(9):e12726.
doi: 10.1371/journal.pone.0012726.

A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas

Affiliations

A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas

Yu-Dong Cai et al. PLoS One. .

Abstract

Cancer subtype classification and survival prediction both relate directly to patients' specific treatment plans, making them fundamental medical issues. Although the two factors are interrelated learning problems, most studies tackle each separately. In this paper, expression levels of genes are used for both cancer subtype classification and survival prediction. We considered 350 diffuse large B-cell lymphoma (DLBCL) subjects, taken from four groups of patients (activated B-cell-like subtype dead, activated B-cell-like subtype alive, germinal center B-cell-like subtype dead, and germinal center B-cell-like subtype alive). As classification features, we used 11,271 gene expression levels of each subject. The features were first ranked by mRMR (Maximum Relevance Minimum Redundancy) principle and further selected by IFS (Incremental Feature Selection) procedure. Thirty-five gene signatures were selected after the IFS procedure, and the patients were divided into the above mentioned four groups. These four groups were combined in different ways for subtype prediction and survival prediction, specifically, the activated versus the germinal center and the alive versus the dead. Subtype prediction accuracy of the 35-gene signature was 98.6%. We calculated cumulative survival time of high-risk group and low-risk groups by the Kaplan-Meier method. The log-rank test p-value was 5.98e-08. Our methodology provides a way to study subtype classification and survival prediction simultaneously. Our results suggest that for some diseases, especially cancer, subtype classification may be used to predict survival, and, conversely, survival prediction features may shed light on subtype features.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The IFS curves for subtype classification model and survival prediction model.
(A) The IFS curve for subtype classification model. The peak overall accuracy was 1 when feature number was 214. However, the overall accuracy had already achieved 0.98 when about 30 features were used. The accuracies only had slight undulation when more features were used. (B) The IFS curve for survival prediction model. The smallest log rank test p-value was 1e- 8.67 when feature number was 182. Local p-values can already reach low when feature number was around 30 to 50. When the optimized 35 features were used the subtype prediction accuracy was 98.6% and the log-rank test p-value was 1e-7.22.
Figure 2
Figure 2. The relationship of subtype classification accuracies and log rank p-values.
The x-axis is subtype classification accuracy and the y-axis is −log10 of the log rank test p-value. The number of features was restricted to be less than 100 and written on the dot. The number of optimized feature set for both models was 35 which have high subtype classification accuracy and small log rank p-value.
Figure 3
Figure 3. The hierarchical clustering heatmap of patient samples based on expression profiles of the 35-gene signature.
Each row represents a signature gene and each column represents a patient sample. The survival and subtype status for each patient are shown with two bars. Black survival bar represents dead, grey survival bar represents alive; red subtype bar stands for ABC subtype, blue subtype bar stands for GCB subtype. The 35-gene signature clearly separated the ABC subtype patients from GCB subtype ones. The dead patients and alive ones were also located at different clusters.
Figure 4
Figure 4. The Kaplan–Meier curve of predicted high-risk and low-risk patients using the 35-gene signature.
The log-rank test p-value comparing the overall survival of predicted high-risk and low-risk patients is 5.98e-08.
Figure 5
Figure 5. The overlap of our 35-gene signature with reported subtype genes and survival genes.
33 genes from our 35-gene signature are reported to be either subtype genes or survival genes.

References

    1. A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin's lymphoma. The Non-Hodgkin's Lymphoma Classification Project. Blood. 1997;89:3909–3918. - PubMed
    1. Veelken H, Vik Dannheim S, Schulte Moenting J, Martens UM, Finke J, et al. Immunophenotype as prognostic factor for diffuse large B-cell lymphoma in patients undergoing clinical risk-adapted therapy. Ann Oncol. 2007;18:931–939. - PubMed
    1. Berglund M, Thunberg U, Amini RM, Book M, Roos G, et al. Evaluation of immunophenotype in diffuse large B-cell lymphoma and its impact on prognosis. Mod Pathol. 2005;18:1113–1120. - PubMed
    1. Lossos IS, Levy R. Diffuse large B-cell lymphoma: insights gained from gene expression profiling. Int J Hematol. 2003;77:321–329. - PubMed
    1. Turgeon ML. Clinical hematology: theory and procedures. Hagerstown, MD: Lippincott Williams & Wilkins; 2005.

Publication types

MeSH terms

Substances