Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct:112:108150.
doi: 10.1016/j.compbiolchem.2024.108150. Epub 2024 Jul 14.

Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes

Affiliations

Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes

Jiayi Wei et al. Comput Biol Chem. 2024 Oct.

Abstract

Objectives: Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer. Understanding the molecular mechanisms underlying tumor progression is of great clinical significance. This study aims to identify novel molecular markers associated with LUAD subtypes, with the goal of improving the precision of LUAD subtype classification. Additionally, optimization efforts are directed towards enhancing insights from the perspective of patient survival analysis.

Materials and methods: We propose an innovative feature-selection approach that focuses on LUAD classification, which is comprehensive and robust. The proposed method integrates multi-omics data from The Cancer Genome Atlas (TCGA) and leverages a synergistic combination of max-relevance and min-redundancy, least absolute shrinkage and selection operator, and Boruta algorithms. These selected features were deployed in six machine-learning classifiers: logistic regression, random forest, support vector machine, naive Bayes, k-Nearest Neighbor, and XGBoost.

Results: The proposed approach achieved an area under the receiver operating characteristic curve (AUC) of 0.9958 for LR. Notably, the accuracy and AUC of a composite model incorporating copy number, methylation, as well as RNA- sequencing data for expression of exons, genes, and miRNA mature strands surpassed the accuracy and AUC metrics of models with single-omics data or other multi-omics combinations. Survival analyses, revealed the SVM classifier to elicit optimal classification, outperforming that achieved by TCGA. To enhance model interpretability, SHapley Additive exPlanations (SHAP) values were utilized to elucidate the impact of each feature on the predictions. Gene Ontology (GO) enrichment analysis identified significant biological processes, molecular functions, and cellular components associated with LUAD subtypes.

Conclusion: In summary, our feature selection process, based on TCGA multi-omics data and combined with multiple machine learning classifiers, proficiently identifies molecular subtypes of lung adenocarcinoma and their corresponding significant genes. Our method could enhance the early detection and diagnosis of LUAD, expedite the development of targeted therapies and, ultimately, lengthen patient survival.

Keywords: Feature selection; Lung adenocarcinoma; Multi-omics data; Subtype classification; Survival analysis.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Subclassification of Lung Adenocarcinoma through Comprehensive Multi-omics Data to Benefit Survival Outcomes”.

LinkOut - more resources