Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 1;16(9):7799-7817.
doi: 10.18632/aging.205783. Epub 2024 May 1.

Identifying lncRNAs and mRNAs related to survival of NSCLC based on bioinformatic analysis and machine learning

Affiliations

Identifying lncRNAs and mRNAs related to survival of NSCLC based on bioinformatic analysis and machine learning

Wei Yue et al. Aging (Albany NY). .

Abstract

Non-small cell lung cancer (NSCLC) is the most common histopathological type, and it is purposeful for screening potential prognostic biomarkers for NSCLC. This study aims to identify the lncRNAs and mRNAs related to survival of non-small cell lung cancer (NSCLC). The expression profile data of lung adenocarcinoma and lung squamous cell carcinoma were downloaded in The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) dataset. A total of eight survival related long non-coding RNAs (lncRNAs) and 262 survival related mRNAs were filtered. By gene set enrichment analysis, 17 significantly correlated Gene Ontology signal pathways and 14 Kyoto Encyclopedia of Genes and Genomes signal pathways were screened. Based on the clinical survival and prognosis information of the samples, we screened eight lncRNAs and 193 mRNAs by single factor Cox regression analysis. Further single and multifactor Cox regression analysis were performed, 30 independent prognostication-related mRNAs were obtained. The PPI network was further constructed. We then performed the machine learning algorithms (Least absolute shrinkage and selection operator, Recursive feature elimination, and Random forest) to screen the optimized DEGs combination, and a total of 17 overlapping mRNAs were obtained. Based on the 17 characteristic mRNAs obtained, we firstly built a Nomogram prediction model, and the ROC values of training set and testing set were 0.835 and 0.767, respectively. By overlapping the 17 characteristic mRNAs and PPI network hub genes, three genes were obtained: CDC6, CEP55, TYMS, which were considered as key factors associated with survival of NSCLC. The in vitro experiments were performed to examine the effect of CDC6, CEP55, and TYMS on NSCLC cells. Finally, the lncRNAs-mRNAs networks were constructed.

Keywords: CDC6; CEP55; NSCLC; machine learning; survival.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST: The authors declare that they have no conflicts of interest.

Figures

Figure 1
Figure 1
Screening prognosis-related mRNAs and lncRNAs based on TCGA data. (A) The sample relationship before and after batch effect removal. (B) Heatmap of differentially expressed mRNAs and lncRNAs in Tumor (994) vs Control (107) comparison group and Dead (394) vs Alive (600) comparison group. (C) A total of eight overlapping lncRNAs and 262 mRNAs were filtered. (DE) The enrichment analysis of GO function and KEGG signal pathway based on DAVID was carried out for the overlapped mRNAs with significant differential expression.
Figure 2
Figure 2
PPI network construction. The network contained 145 gene nodes in total.
Figure 3
Figure 3
Optimal mRNA marker excavation and nomogram diagnostic model construction. (AC) Filter characteristic mRNAs parameter diagram of RFE, RF, and LASSO. (D) Comparison chart of characteristic mRNAs combinations filtered by RFE, RF, and LASSO.
Figure 4
Figure 4
Nomogram diagnostic model construction and evaluation. (A) Nomogram model diagram based on the expression level of 17 characteristic mRNAs in the combined training data set. (B) Nomogram diagnostic model line chart. (C) The ROC value was calculated. (D) Model decision line diagram.
Figure 5
Figure 5
Evaluation of nomogram diagnostic model in GSE37745 dataset. (A) Nomogram model diagram of expression level of mRNAs in GSE37745 validation data set based on 17 features. (B) Nomogram diagnostic model line chart. (C) The ROC value was calculated. (D) Model decision line diagram.
Figure 6
Figure 6
The expression of 17 mRNAs in combined TCGA training set and GSE37745 testing dataset. (A) The expression of 17 mRNAs in combined TCGA training set. (B) The expression of 17 mRNAs in GSE37745 dataset. 0.01<*P<0.05; 0.005< **P<0.01; ***P<0.005.
Figure 7
Figure 7
The prognostic analysis of CDC6, CEP55, and TYMS in TCGA and GSE37745. (A) Kaplan-Meier used for prognostic analysis of CDC6, CEP55, and TYMS in combined TCGA training set. (B) Kaplan-Meier used for prognostic analysis of CDC6, CEP55, and TYMS in GSE37745 validation data set.
Figure 8
Figure 8
Construction of a co-expression network based on characteristic mRNAs and lncRNAs. A total of 79 pairs of relationship pairs were screened, and the relationship connection network was constructed.

Similar articles

Cited by

References

    1. Basumallik N, Agarwal M. Small Cell Lung Cancer. 2023. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2024. - PubMed
    1. Roy-Chowdhuri S. Molecular Pathology of Lung Cancer. Surg Pathol Clin. 2021; 14:369–77. 10.1016/j.path.2021.05.002 - DOI - PubMed
    1. Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J (Engl). 2022; 135:584–90. 10.1097/CM9.0000000000002108 - DOI - PMC - PubMed
    1. Sun D, Li H, Cao M, He S, Lei L, Peng J, Chen W. Cancer burden in China: trends, risk factors and prevention. Cancer Biol Med. 2020; 17:879–95. 10.20892/j.issn.2095-3941.2020.0387 - DOI - PMC - PubMed
    1. Li Y, Wu X, Yang P, Jiang G, Luo Y. Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis. Genomics Proteomics Bioinformatics. 2022; 20:850–66. 10.1016/j.gpb.2022.11.003 - DOI - PMC - PubMed

Publication types

MeSH terms