Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 15;6(7):102216.
doi: 10.1016/j.xcrm.2025.102216. Epub 2025 Jul 2.

AI-enabled molecular phenotyping and prognostic predictions in lung cancer through multimodal clinical information integration

Affiliations

AI-enabled molecular phenotyping and prognostic predictions in lung cancer through multimodal clinical information integration

Yuxing Lu et al. Cell Rep Med. .

Abstract

Lung cancer remains the leading cause of cancer-related mortality worldwide. The need for cost-effective, non-invasive methods to detect specific gene mutations for targeted therapy and predict patient survival outcomes underscores the importance of advancing diagnostic and prognostic capabilities. Contemporary lung cancer diagnostic models often fail to integrate diverse patient data, leading to incomplete clinical assessments. To address these challenges, we propose LUCID, a multimodal data integration framework designed to predict epidermal growth factor receptor (EGFR) mutation status and survival outcomes in patients with lung cancer. Tailored for early-stage clinical assessment, LUCID leverages lung computed tomography (CT) images, chief complaints, laboratory test results, and demographic data to deliver comprehensive, non-invasive predictions. LUCID achieved strong performance in a retrospective cohort of 5,175 patients, with areas under the receiver operating characteristic curve (AUCs) ranging from 0.851 to 0.881 for EGFR mutation prediction and from 0.821 to 0.912 for survival time prediction. The model also demonstrated robustness across external validation cohorts and in scenarios with missing modalities.

Keywords: EGFR mutation; lung cancer; multimodal integration; non-invasive diagnosis; precision medicine; survival prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Pipeline of the multimodal LUCID lung cancer diagnostic model (A and B) The model comprises two primary stages. (A) Stage 1 performs binary classification on CT images to identify those with high suspicion of malignancy. (B) Stage 2 integrates four modalities—CT images, textual complaints, lab results, and demographic data—to predict EGFR mutation type or survival time. LUCID is robust to missing modalities and adapts accordingly.
Figure 2
Figure 2
The architecture of the multimodal integration module of LUCID (A) LUCID incorporates four distinct input modalities: CT images, patient’s textual chief complaints, laboratory test results, and demographic information. Each modality is initially encoded into a vector representation before sending into the hierarchical multimodal integration framework. The loss computation encompasses image classification loss, clinical classification loss, and joint loss. (B) Joint-attention mechanism: image and clinical embeddings are merged to create a joint embedding. Multihead attention calculations are then performed to discern the inner-attention across varied features. (C) Cross-attention mechanism: embeddings from one modality serve as the key vector in multihead attention computations, while embeddings from the other modality function as value and query vectors. This facilitates the attention calculations between distinct modalities.
Figure 3
Figure 3
Performance and empirical examples of the stage-1 classification model (A) We evaluated ViT models at varying parameter scales and observed a scaling law in classification capability with the increase of the models’ parameters. (B) A 5-fold validation was conducted during the training of the stage-1 model, and we plotted the AUC along with the 95% CI region. (C) All CT images from a patient (PID:0019770975) diagnosed with stage-4 lung cancer were processed through our stage-1 model. CT images outlined in red indicate the model’s predictions with a high suspicion of lung cancer presence. Data are represented as mean ± 95% CI.
Figure 4
Figure 4
Performance evaluation of LUCID in EGFR type and survival prediction Comparison of LUCID’s accuracy with other methods in two EGFR mutation prediction tasks (N = 1,035) and three survival prediction tasks (N = 952). (A) AUC and method comparisons for distinguishing EGFR-sensitive cases from others (591:444). (B) AUC and method comparisons for distinguishing EGFR wild type from mutant-type cases (599:436), including unimodal and bimodal approaches. (C–E) KM curves and method comparisons for predicting 1-year (835:117), 3-year (469:483), and 5-year (199:753) survival. Data are represented as mean ± 95% CI. p values for KM curves are calculated.
Figure 5
Figure 5
LUCID’s performance on external validation dataset and evaluations of different modalities (A) LUCID demonstrated strong predictive performance with an AUC of 0.876 (95% CI: 0.857–0.893) on the external validation dataset (N = 1,285). (B) Overall performance and three feature exclusion experiments suggest that different feature modalities contribute differently to the model’s predictive capability. Data are represented as mean ± 95% CI.

References

    1. Bray F., Laversanne M., Sung H., Ferlay J., Siegel R.L., Soerjomataram I., Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024;74:229–263. - PubMed
    1. Siegel R.L., Giaquinto A.N., Jemal A. Cancer statistics, 2024. CA Cancer J. Clin. 2024;74:12–49. - PubMed
    1. Zhang K., Liu X., Shen J., Li Z., Sang Y., Wu X., Zha Y., Liang W., Wang C., Wang K., et al. Clinically Applicable AI System for Accurate Diagnosis, Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography. Cell. 2020;181:1423–1433.e11. doi: 10.1016/j.cell.2020.04.045. - DOI - PMC - PubMed
    1. Chaunzwa T.L., Hosny A., Xu Y., Shafer A., Diao N., Lanuti M., Christiani D.C., Mak R.H., Aerts H.J.W.L. Deep learning classification of lung cancer histology using CT images. Sci. Rep. 2021;11:5471. doi: 10.1038/s41598-021-84630-x. - DOI - PMC - PubMed
    1. Ladbury C., Amini A., Govindarajan A., Mambetsariev I., Raz D.J., Massarelli E., Williams T., Rodin A., Salgia R. Integration of artificial intelligence in lung cancer: Rise of the machine. Cell Rep. Med. 2023;4 - PMC - PubMed