Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov;14(11):2947-60.
doi: 10.1074/mcp.M115.050245. Epub 2015 Aug 26.

Machine Learning-based Classification of Diffuse Large B-cell Lymphoma Patients by Their Protein Expression Profiles

Affiliations

Machine Learning-based Classification of Diffuse Large B-cell Lymphoma Patients by Their Protein Expression Profiles

Sally J Deeb et al. Mol Cell Proteomics. 2015 Nov.

Abstract

Characterization of tumors at the molecular level has improved our knowledge of cancer causation and progression. Proteomic analysis of their signaling pathways promises to enhance our understanding of cancer aberrations at the functional level, but this requires accurate and robust tools. Here, we develop a state of the art quantitative mass spectrometric pipeline to characterize formalin-fixed paraffin-embedded tissues of patients with closely related subtypes of diffuse large B-cell lymphoma. We combined a super-SILAC approach with label-free quantification (hybrid LFQ) to address situations where the protein is absent in the super-SILAC standard but present in the patient samples. Shotgun proteomic analysis on a quadrupole Orbitrap quantified almost 9,000 tumor proteins in 20 patients. The quantitative accuracy of our approach allowed the segregation of diffuse large B-cell lymphoma patients according to their cell of origin using both their global protein expression patterns and the 55-protein signature obtained previously from patient-derived cell lines (Deeb, S. J., D'Souza, R. C., Cox, J., Schmidt-Supprian, M., and Mann, M. (2012) Mol. Cell. Proteomics 11, 77-89). Expression levels of individual segregation-driving proteins as well as categories such as extracellular matrix proteins behaved consistently with known trends between the subtypes. We used machine learning (support vector machines) to extract candidate proteins with the highest segregating power. A panel of four proteins (PALD1, MME, TNFAIP8, and TBC1D4) is predicted to classify patients with low error rates. Highly ranked proteins from the support vector analysis revealed differential expression of core signaling molecules between the subtypes, elucidating aspects of their pathobiology.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Proteomic workflow and coverage of 20 FFPE tissue samples from DLBCL patients. A, two slices of macrodissected patient FFPE tissues were processed according to the FASP-FFPE protocol. The super-SILAC approach was used for quantitative measurements using a quadrupole Orbitrap mass spectrometer (Q Exactive). Quantification was based on SILAC ratios combined with label-free quantifications in cases where no SILAC pairs were detected. The data were analyzed using MaxQuant software, resulting in the identification of more than 9,000 proteins. B, percent coverage of signaling pathways and cellular processes in the quantified proteomes of DLBCL patients. RP, reversed phase; ECM, extracellular matrix; TCA, tricarboxylic acid.
Fig. 2.
Fig. 2.
Quantified proteomes of FFPE tissues from DLBCL patients. A, Pearson's correlation coefficient (r) of two representative patient samples (TRR003 and TRR013). B, dynamic range of proteomes of DLBCL patients highlighting KEGG annotated proteins involved in pathways in cancer.
Fig. 3.
Fig. 3.
DLBCL patient tissue proteomes versus DLBCL cell line proteomes. A, overlap in protein groups between patient tissue proteomes and cell line proteomes. B, the distribution of proteins exclusively quantified in the patient samples (red) in comparison with the total distribution (blue). C, principal component analysis of patient tissue samples using the 55-protein segregating signature derived from cell lines.
Fig. 4.
Fig. 4.
Principal component analysis of patient samples using their global protein expression profiles. A, the global proteomes of 20 DLBCL patient samples segregated diagonally into ABC-DLBCL (13 samples) and GCB-DLBCL subtypes (seven samples) based on component 1, which accounts for 11.9% of variability, versus component 4, which accounts for 7.4% of the variability. B, loadings of A highlighted in red reveal the main proteins driving the COO diagonal segregation. C, cancer module 47, which is composed of extracellular proteins and collagens, is highly enriched in component 1.
Fig. 5.
Fig. 5.
ABC-DLBCL versus GCB-DLBCL. A, Pearson correlation of ABC-DLBCL (ABC) versus GCB-DLBCL (GC) after taking median expression values of protein groups across patients in each subtype. B, two-dimensional annotation enrichment of ABC-DLBCL against GCB-DLBCL using cancer modules.
Fig. 6.
Fig. 6.
Support vector machine analysis for optimal feature selection. A, support vector machine feature selection using p values of standard analysis of variance tests resulted in a set of four features with 1.4% error. B, unsupervised hierarchical clustering of top 343 protein candidates or features determined by support vector machine analysis. C, unsupervised hierarchical clustering of extracellular matrix, plasma membrane, and nuclear proteins in the 343 top protein candidates.

References

    1. van Dijk E. L., Auger H., Jaszczyszyn Y., and Thermes C. (2014) Ten years of next-generation sequencing technology. Trends Genet. 30, 418–426 - PubMed
    1. Schilsky R. L. (2010) Personalized medicine in oncology: the future is now. Nat. Rev. Drug Discov. 9, 363–366 - PubMed
    1. Alizadeh A. A., Eisen M. B., Davis R. E., Ma C., Lossos I. S., Rosenwald A., Boldrick J. C., Sabet H., Tran T., Yu X., Powell J. I., Yang L., Marti G. E., Moore T., Hudson J. Jr., Lu L., Lewis D. B., Tibshirani R., Sherlock G., Chan W. C., Greiner T. C., Weisenburger D. D., Armitage J. O., Warnke R., Levy R., Wilson W., Grever M. R., Byrd J. C., Botstein D., Brown P. O., and Staudt L. M. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 - PubMed
    1. Wright G., Tan B., Rosenwald A., Hurt E. H., Wiestner A., and Staudt L. M. (2003) A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc. Natl. Acad. Sci. U.S.A. 100, 9991–9996 - PMC - PubMed
    1. Roschewski M., Staudt L. M., and Wilson W. H. (2014) Diffuse large B-cell lymphoma—treatment approaches in the molecular era. Nat. Rev. Clin. Oncol. 11, 12–23 - PMC - PubMed

MeSH terms