Machine learning models-based on integration of next-generation sequencing testing and tumor cell sizes improve subtype classification of mature B-cell neoplasms
- PMID: 37601650
- PMCID: PMC10436202
- DOI: 10.3389/fonc.2023.1160383
Machine learning models-based on integration of next-generation sequencing testing and tumor cell sizes improve subtype classification of mature B-cell neoplasms
Abstract
Background: Next-generation sequencing (NGS) panels for mature B-cell neoplasms (MBNs) are widely applied clinically but have yet to be routinely used in a manner that is suitable for subtype differential diagnosis. This study retrospectively investigated newly diagnosed cases of MBNs from our laboratory to investigate mutation landscapes in Chinese patients with MBNs and to combine mutational information and machine learning (ML) into clinical applications for MBNs, especially for subtype classification.
Methods: Samples from the Catalogue Of Somatic Mutations In Cancer (COSMIC) database were collected for ML model construction and cases from our laboratory were used for ML model validation. Five repeats of 10-fold cross-validation Random Forest algorithm was used for ML model construction. Mutation detection was performed by NGS and tumor cell size was confirmed by cell morphology and/or flow cytometry in our laboratory.
Results: Totally 849 newly diagnosed MBN cases from our laboratory were retrospectively identified and included in mutational landscape analyses. Patterns of gene mutations in a variety of MBN subtypes were found, important to investigate tumorigenesis in MBNs. A long list of novel mutations was revealed, valuable to both functional studies and clinical applications. By combining gene mutation information revealed by NGS and ML, we established ML models that provide valuable information for MBN subtype classification. In total, 8895 cases of 8 subtypes of MBNs in the COSMIC database were collected and utilized for ML model construction, and the models were validated on the 849 MBN cases from our laboratory. A series of ML models was constructed in this study, and the most efficient model, with an accuracy of 0.87, was based on integration of NGS testing and tumor cell sizes.
Conclusions: The ML models were of great significance in the differential diagnosis of all cases and different MBN subtypes. Additionally, using NGS results to assist in subtype classification of MBNs by method of ML has positive clinical potential.
Keywords: machine learning (ML); mature B-cell neoplasms (MBNs); next-generation sequencing (NGS); pathological diagnosis; subtype classification.
Copyright © 2023 Mu, Chen, Meng, Chen, Fan, Yuan, Lin, Pan, Li, Feng, Diao, Li, Yu and Liu.
Conflict of interest statement
Authors YFM, YHM, TC, XF, JY, JL, GL, and SY are employed by the company Guangzhou KingMed Transformative Medicine Institute Co., Ltd., Guangzhou, China. Authors YC, JP, JF, KD, and SY are employed by the company Guangzhou KingMed Center for Clinical Laboratory Co., Ltd., Guangzhou, China. Authors YC, YM, YL, and SY are employed by the company Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou, China. The remaining author declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures




Similar articles
-
Machine learning based on multiplatform tests assists in subtype classification of mature B-cell neoplasms.Br J Haematol. 2025 Jan;206(1):224-234. doi: 10.1111/bjh.19934. Epub 2024 Dec 3. Br J Haematol. 2025. PMID: 39627967
-
Machine learning random forest for predicting oncosomatic variant NGS analysis.Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y. Sci Rep. 2021. PMID: 34750410 Free PMC article.
-
Implementation of next generation sequencing technology for somatic mutation detection in routine laboratory practice.Pathology. 2018 Jun;50(4):389-401. doi: 10.1016/j.pathol.2018.01.005. Epub 2018 May 8. Pathology. 2018. PMID: 29752127
-
Comprehensive elaboration of database resources utilized in next-generation sequencing-based tumor somatic mutation detection.Biochim Biophys Acta Rev Cancer. 2019 Aug;1872(1):122-137. doi: 10.1016/j.bbcan.2019.06.004. Epub 2019 Jun 29. Biochim Biophys Acta Rev Cancer. 2019. PMID: 31265877 Review.
-
Circulating tumor DNA tracking in patients with pancreatic cancer using next-generation sequencing.Gastroenterol Hepatol. 2022 Oct;45(8):637-644. doi: 10.1016/j.gastrohep.2021.12.011. Epub 2022 Jan 31. Gastroenterol Hepatol. 2022. PMID: 35092761 Review. English, Spanish.
References
LinkOut - more resources
Full Text Sources
Miscellaneous