Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 6:2023:8964676.
doi: 10.1155/2023/8964676. eCollection 2023.

Evaluating Histological Subtypes Classification of Primary Lung Cancers on Unenhanced Computed Tomography Based on Random Forest Model

Affiliations

Evaluating Histological Subtypes Classification of Primary Lung Cancers on Unenhanced Computed Tomography Based on Random Forest Model

Jianfeng Huang et al. J Healthc Eng. .

Abstract

Lung cancer is the leading cause of cancer-related death in many countries, and an accurate histopathological diagnosis is of great importance in subsequent treatment. The aim of this study was to establish the random forest (RF) model based on radiomic features to automatically classify and predict lung adenocarcinoma (ADC), lung squamous cell carcinoma (SCC), and small cell lung cancer (SCLC) on unenhanced computed tomography (CT) images. Eight hundred and fifty-two patients (mean age: 61.4, range: 29-87, male/female: 536/316) with preoperative unenhanced CT and postoperative histopathologically confirmed primary lung cancers, including 525 patients with ADC, 161 patients with SCC, and 166 patients with SCLC, were included in this retrospective study. Radiomic features were extracted, selected, and then used to establish the RF classification model to analyse and classify primary lung cancers into three subtypes, including ADC, SCC, and SCLC according to histopathological results. The training (446 ADC, 137 SCC, and 141 SCLC) and testing cohorts (79 ADC, 24 SCC, and 25 SCLC) accounted for 85% and 15% of the whole datasets, respectively. The prediction performance of the RF classification model was evaluated by F1 scores and the receiver operating characteristic (ROC) curve. On the testing cohort, the areas under the ROC curve (AUC) of the RF model in classifying ADC, SCC, and SCLC were 0.74, 0.77, and 0.88, respectively. The F1 scores achieved 0.80, 0.40, and 0.73 in ADC, SCC, and SCLC, respectively, and the weighted average F1 score was 0.71. In addition, for the RF classification model, the precisions were 0.72, 0.64, and 0.70; the recalls were 0.86, 0.29, and 0.76; and the specificities were 0.55, 0.96, and 0.92 in ADC, SCC, and SCLC. The primary lung cancers were feasibly and effectively classified into ADC, SCC, and SCLC based on the combination of RF classification model and radiomic features, which has the potential for noninvasive predicting histological subtypes of primary lung cancers.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Figure 1
Figure 1
Segmentation of lesions on CT images and 3D ROI for (a) adenocarcinoma, (b) squamous cell carcinoma, and (c) small cell lung cancer.
Figure 2
Figure 2
The flowchart for extraction and selection radiomic feature.
Figure 3
Figure 3
Correlation between radiomic features. 0 represents no correlation.
Figure 4
Figure 4
The importance score of each feature for predicting the histopathological subtypes of primary lung cancer.
Figure 5
Figure 5
Flowchart of random forest algorithm.
Figure 6
Figure 6
ROC curve analysis of the RF classification model. The black solid line represents SCLC (class 0), the blue solid line represents ADC (class 1), the green solid line represents SCC (class 2), and the dark blue dotted line represents the average.

References

    1. Breiman L. Random forest. Machine Learning . 2001;45(1):5–32. doi: 10.1023/a:1010933404324. - DOI
    1. Wang H., Li G. A selective review on random survival forests for high dimensional data. Quantitative Bio-Science . 2017;36(2):85–96. doi: 10.22283/qbs.2017.36.2.85. - DOI - PMC - PubMed
    1. Touw W. G., Bayjanov J. R., Overmars L., et al. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Briefings in Bioinformatics . 2013;14(3):315–326. doi: 10.1093/bib/bbs034. - DOI - PMC - PubMed
    1. Li J., Tian Y., Zhu Y., et al. A multicenter random forest model for effective prognosis prediction in collaborative clinical research network. Artificial Intelligence in Medicine . 2020;103 doi: 10.1016/j.artmed.2020.101814.101814 - DOI - PubMed
    1. Liu D., Zhang X., Zheng T., et al. Optimisation and evaluation of the random forest model in the efficacy prediction of chemoradiotherapy for advanced cervical cancer based on radiomics signature from high-resolution T2 weighted images. Archives of Gynecology and Obstetrics . 2021;303(3):811–820. doi: 10.1007/s00404-020-05908-5. - DOI - PMC - PubMed

Publication types