Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes
- PMID: 39018587
- DOI: 10.1016/j.compbiolchem.2024.108150
Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes
Abstract
Objectives: Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer. Understanding the molecular mechanisms underlying tumor progression is of great clinical significance. This study aims to identify novel molecular markers associated with LUAD subtypes, with the goal of improving the precision of LUAD subtype classification. Additionally, optimization efforts are directed towards enhancing insights from the perspective of patient survival analysis.
Materials and methods: We propose an innovative feature-selection approach that focuses on LUAD classification, which is comprehensive and robust. The proposed method integrates multi-omics data from The Cancer Genome Atlas (TCGA) and leverages a synergistic combination of max-relevance and min-redundancy, least absolute shrinkage and selection operator, and Boruta algorithms. These selected features were deployed in six machine-learning classifiers: logistic regression, random forest, support vector machine, naive Bayes, k-Nearest Neighbor, and XGBoost.
Results: The proposed approach achieved an area under the receiver operating characteristic curve (AUC) of 0.9958 for LR. Notably, the accuracy and AUC of a composite model incorporating copy number, methylation, as well as RNA- sequencing data for expression of exons, genes, and miRNA mature strands surpassed the accuracy and AUC metrics of models with single-omics data or other multi-omics combinations. Survival analyses, revealed the SVM classifier to elicit optimal classification, outperforming that achieved by TCGA. To enhance model interpretability, SHapley Additive exPlanations (SHAP) values were utilized to elucidate the impact of each feature on the predictions. Gene Ontology (GO) enrichment analysis identified significant biological processes, molecular functions, and cellular components associated with LUAD subtypes.
Conclusion: In summary, our feature selection process, based on TCGA multi-omics data and combined with multiple machine learning classifiers, proficiently identifies molecular subtypes of lung adenocarcinoma and their corresponding significant genes. Our method could enhance the early detection and diagnosis of LUAD, expedite the development of targeted therapies and, ultimately, lengthen patient survival.
Keywords: Feature selection; Lung adenocarcinoma; Multi-omics data; Subtype classification; Survival analysis.
Copyright © 2024 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Subclassification of Lung Adenocarcinoma through Comprehensive Multi-omics Data to Benefit Survival Outcomes”.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical