Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2025 Aug;15(8):e70418.
doi: 10.1002/ctm2.70418.

A robust machine learning model based on ribosomal-subunit-derived piRNAs for diagnostic potential of nonsmall cell lung cancer across multicentre, large-scale of sequencing data

Affiliations
Multicenter Study

A robust machine learning model based on ribosomal-subunit-derived piRNAs for diagnostic potential of nonsmall cell lung cancer across multicentre, large-scale of sequencing data

Zitong Gao et al. Clin Transl Med. 2025 Aug.

Abstract

Nonsmall cell lung cancer (NSCLC) is a lethal cancer and lacks robust biomarkers for noninvasive clinical diagnosis. Detecting NSCLC at the early stage can decrease the mortality rate and minimise harm caused by various treatments. We curated 2050 samples from public tissue and plasma datasets including both invasive and noninvasive types, then supplemented with in-house pooled plasma and exosome samples. Eleven independent transcriptome datasets were utilised to develop a new machine learning model by integrating PIWI-interacting RNA (piRNA) to predict NSCLC. Five piRNA signatures derived from ribosomal subunits identified to be tumour-specific exhibited robust diagnostic ability and were combined into a piRNA-Based Tumour Probability Index (pi-TPI) risk evaluation model. pi-TPI effectively distinguished NSCLC patients from healthy individuals and showed efficacy in identifying early-stage cancers with Area under the ROC Curve (AUC) values over .80. Plasma cohorts exhibited the diagnosis efficacy of pi-TPI with an AUC value of .85. Experimental exosomal data enhances the accuracy of diagnosing noncancerous, benign, and cancer cases. The pi-TPI marker in the noncancer/cancer subgroup exhibited superior predictive performance with an AUC value of .96. These findings underscore the significant clinical potential of the five piRNA signatures as a powerful diagnostic tool for NSCLC, particularly of noninvasive cancer diagnostics.

Keywords: PIWI‐interacting RNA; machine learning; noninvasive diagnosis; nonsmall cell lung cancer; small noncoding RNA.

PubMed Disclaimer

Conflict of interest statement

Jeffrey A Borgia is on the scientific advisory board for the Luminex Corporation and Rational Vaccines, Inc. and gets paid as a consultant with both.

Figures

FIGURE 1
FIGURE 1
Overview of development of the model for multisource data diagnosis of NSCLC.
FIGURE 2
FIGURE 2
13 piRNA‐based RF model performance across tissue, plasma and exosome in test set. (A) Bar chart showing the number of patients in each cohort of cancer and noncancer category. (B) Ranking of feature importance of 13 piRNA from RF training model (n = 812). (C–H) ROC curves with the corresponding AUC values of RF model in the training set and holdout/tissue/plasma/exosome/independent validation, and confusion matrices showing diagnosis results generated by RF model in holdout/tissue/plasma/exosome/independent cohort.
FIGURE 3
FIGURE 3
Five piRNA‐based pi‐TPI model performance in holdout, tissue and independent cohorts. (A, B) ROC curves with the corresponding AUC values of pi‐TPI model in the training set and holdout and tissue validation, and confusion matrices showing diagnosis results generated by pi‐TPI model in holdout and tissue validation. Boxplots showing the transformed risk score from pi‐TPI model in cancer and noncancer group in holdout and tissue cohort. Two‐sided p values were calculated using Mann–Whitney U test. (C) Normalised five piRNA expression level in TCGA, GSE175462, GSE110907 (tissue dataset used in training set) cohort between cancer and noncancer. (D, E) ROC curves with the corresponding AUC values of pi‐TPI model in the training set and two independent validations, GSE62182 and GSE83527, and confusion matrices showing diagnosis results generated by pi‐TPI model in two independent validations. Boxplots showing the transformed risk score from pi‐TPI model in cancer and noncancer group in each cohort. Two‐sided p values were calculated using Mann–Whitney U test. (F) Normalised five piRNA expression level in GSE62182 and GSE83527 dataset between cancer and noncancer.
FIGURE 4
FIGURE 4
Five piRNA‐based pi‐TPI model performance in plasma and exosome cohorts. (A) ROC curves with the corresponding AUC values of pi‐TPI model in the training set and plasma validation, and confusion matrices showing diagnosis results generated by pi‐TPI model in plasma validation. Boxplots showing the transformed risk score from pi‐TPI model in cancer and noncancer group in plasma cohort. Two‐sided p values were calculated using Mann–Whitney U test. (B) Normalised five piRNA expression level in two plasma datasets, GSE204951, GSE148861/GSE148862(merged) cohort between cancer and noncancer. (C) ROC curves with the corresponding AUC values of pi‐TPI model in the training set and exosome with its subgroups, noncancer vs. cancer. Boxplots showing the transformed risk score from pi‐TPI model in cancer and noncancer group in each subgroup. Two‐sided p values were calculated using Mann–Whitney U test. (D) Normalised five piRNA expression level in exosome between cancer and noncancer. (E, F) ROC curves with the corresponding AUC values of pi‐TPI model in the training set and exosome subgroups noncancer/benign vs. cancer and benign vs. cancer, and confusion matrices showing diagnosis results generated by pi‐TPI model in each subgroup. Boxplots showing the transformed risk score from pi‐TPI model in cancer and noncancer group in each subgroup. Two‐sided p values were calculated using Mann–Whitney U test.
FIGURE 5
FIGURE 5
Function prediction of five piRNA based on Pearson's correlation and Reactome. (A) Genes correlated with five piRNA after Pearson's correlation analysis. (B) Network showing the five piRNA with correlated genes that enriched in different signalling pathways.

Similar articles

References

    1. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin. 2024;74:12‐49. - PubMed
    1. Shiels MS, Graubard BI, McNeel TS, Kahle L, Freedman ND. Trends in smoking‐attributable and smoking‐unrelated lung cancer death rates in the U.S., 1991–2018. J Natl Cancer Inst. 2023;116(5):711‐716. - PMC - PubMed
    1. Purandare NC, Rangarajan V. Imaging of lung cancer: implications on staging and management. Indian J Radiol Imaging. 2015;25:109‐120. - PMC - PubMed
    1. Casagrande GMS, Silva MO, Reis RM, Leal LF. Liquid biopsy for lung cancer: up‐to‐date and perspectives for screening programs. Int J Mol Sci. 2023;24:2505. - PMC - PubMed
    1. Connal S, Cameron JM, Sala A, et al. Liquid biopsies: the future of cancer early detection. J Transl Med. 2023;21:118. - PMC - PubMed

Publication types