Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 3;10(1):5880.
doi: 10.1038/s41598-020-62803-4.

Exploratory study on classification of lung cancer subtypes through a combined K-nearest neighbor classifier in breathomics

Affiliations

Exploratory study on classification of lung cancer subtypes through a combined K-nearest neighbor classifier in breathomics

Chunyan Wang et al. Sci Rep. .

Abstract

Accurate classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) in lung cancer is critical to physicians' clinical decision-making. Exhaled breath analysis provides a tremendous potential approach in non-invasive diagnosis of lung cancer but was rarely reported for lung cancer subtypes classification. In this paper, we firstly proposed a combined method, integrating K-nearest neighbor classifier (KNN), borderline2-synthetic minority over-sampling technique (borderlin2-SMOTE), and feature reduction methods, to investigate the ability of exhaled breath to distinguish AC from SCC patients. The classification performance of the proposed method was compared with the results of four classification algorithms under different combinations of borderline2-SMOTE and feature reduction methods. The result indicated that the KNN classifier combining borderline2-SMOTE and feature reduction methods was the most promising method to discriminate AC from SCC patients and obtained the highest mean area under the receiver operating characteristic curve (0.63) and mean geometric mean (58.50) when compared to others classifiers. The result revealed that the combined algorithm could improve the classification performance of lung cancer subtypes in breathomics and suggested that combining non-invasive exhaled breath analysis with multivariate analysis is a promising screening method for informing treatment options and facilitating individualized treatment of lung cancer subtypes patients.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Study design flow. The input data first is processed using approach 1: without any processing; approach 2: borderline resampling technique only; approach 3: dimensionality reduction only; approach 4: dimensionality reduction and borderline resampling technique. And then, five classifiers are applied to establish a classification model in the training phase; final, the classification performance is evaluated in the testing set.
Figure 2
Figure 2
The Hotellingss T2 range is plotted for outlier detection by the sample number on the horizontal axis and T2 range on the vertical. The green and red dotted lines represent the 95% and 99% confidence intervals, respectively.
Figure 3
Figure 3
The visualization of the 3D scatters plot (a) before and (b) after the process of borderline2-SMOTE in PCA. These three axes represent the first three principal components. Abbreviations: PC: principal component. AC (red circle) and SCC (green circle) represent adenocarcinoma and squamous cell carcinoma, respectively.
Figure 4
Figure 4
The classification result of G-mean and AUC value in five classifiers. (a,b) Represent the results of AUC and G-mean, respectively. Error bars are added to approach 2 for considering the average result after borderline2-SMOTE.
Figure 5
Figure 5
Heat map presents the predictive performance in approach 3. Five classifiers across two feature dimensionality reduction methods (in rows) and selected ranges (in columns) in adenocarcinoma and squamous cell carcinoma patients are presented. (a,b) Are the AUC and G-mean values of five classifiers without borderline2-SMOTE, respectively.
Figure 6
Figure 6
Heat map presents the predictive performance in approach 4. Five classifiers across two feature dimensionality reduction methods (in rows) and selected ranges (in columns) in adenocarcinoma and squamous cell carcinoma patients are presented. (a,b) Are the AUC and G-mean values of five classifiers with borderline2-SMOTE, respectively.

References

    1. Barash O, et al. Classification of lung cancer histology by gold nanoparticle sensors. Nanomedicine: nanotechnology, biology, and medicine. 2012;8:580–589. doi: 10.1016/j.nano.2011.10.001. - DOI - PMC - PubMed
    1. Falco, M. et al. Tumour biomarkers: homeostasis as a novel prognostic indicator. Open Biol6, 10.1098/rsob.160254 (2016). - PMC - PubMed
    1. Nir P, Koichi Y, Wynes MW, Hirsch FR. Predictive and prognostic markers for epidermal growth factor receptor inhibitor therapy in non-small cell lung cancer. Therapeutic Advances in Medical Oncology. 2009;1:137. doi: 10.1177/1758834009347923. - DOI - PMC - PubMed
    1. Cooper WA, O’Toole S, Boyer M, Horvath L, Mahar A. What’s new in non-small cell lung cancer for pathologists: the importance of accurate subtyping, EGFR mutations and ALK rearrangements. Pathology. 2011;43:103–115. doi: 10.1097/PAT.0b013e328342629d. - DOI - PubMed
    1. Manegold C. Treatment algorithm in 2014 for advanced non-small cell lung cancer: therapy selection by tumour histology and molecular biology. Advances in medical sciences. 2014;59:308–313. doi: 10.1016/j.advms.2014.08.008. - DOI - PubMed

Publication types