Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning
- PMID: 39711742
- PMCID: PMC11660274
- DOI: 10.1177/20552076241305168
Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning
Abstract
Objective: While current multimodal approaches in the diagnosis and severity assessment of pneumonia demonstrate remarkable performance, they frequently overlook the issue of modality absence-a common challenge in clinical practice. Thus, we present the robust multimodal transformer (RMT) model, crafted to bridge this gap. The RMT model aims to enhance diagnosis and severity assessment accuracy in situations with incomplete data, thereby ensuring it meets the complex needs of real-world clinical settings.
Method: The RMT model leverages multimodal data, integrating X-ray images and clinical text data through a sophisticated AI-driven framework. It employs a Transformer-based architecture, enhanced by multi-task learning and mask attention mechanism. This approach aims to optimize the model's performance across different modalities, particularly under conditions of modality absence.
Results: The RMT model demonstrates superior performance over traditional diagnostic methods and baseline models in accuracy, precision, sensitivity, and specificity. In tests involving various scenarios, including single-modal and multimodal tasks, the model shows remarkable robustness in handling incomplete data. Its effectiveness is further validated through extensive comparative analysis and ablation studies.
Conclusion: The RMT model represents a substantial advancement in pediatric pneumonia severity assessment. It successfully harnesses multimodal data and advanced AI techniques to improve assessment precision. While the RMT model sets a new precedent in AI applications in medical diagnostics, the development of a comprehensive pediatric pneumonia dataset marks a pivotal contribution, providing a robust foundation for future research.
Keywords: Clinical data; chest X-ray image; deep learning; multimodal transformers; pediatric pneumonia.
© The Author(s) 2024.
Conflict of interest statement
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures









Similar articles
-
Missing-modality enabled multi-modal fusion architecture for medical data.J Biomed Inform. 2025 Apr;164:104796. doi: 10.1016/j.jbi.2025.104796. Epub 2025 Feb 21. J Biomed Inform. 2025. PMID: 39988001
-
Robust multi-modal fusion architecture for medical data with knowledge distillation.Comput Methods Programs Biomed. 2025 Mar;260:108568. doi: 10.1016/j.cmpb.2024.108568. Epub 2024 Dec 18. Comput Methods Programs Biomed. 2025. PMID: 39709743
-
A deep learning model to enhance the classification of primary bone tumors based on incomplete multimodal images in X-ray, CT, and MRI.Cancer Imaging. 2024 Oct 10;24(1):135. doi: 10.1186/s40644-024-00784-7. Cancer Imaging. 2024. PMID: 39390604 Free PMC article.
-
Multimodal data integration for oncology in the era of deep neural networks: a review.Front Artif Intell. 2024 Jul 25;7:1408843. doi: 10.3389/frai.2024.1408843. eCollection 2024. Front Artif Intell. 2024. PMID: 39118787 Free PMC article. Review.
-
A review of deep learning-based information fusion techniques for multimodal medical image classification.Comput Biol Med. 2024 Jul;177:108635. doi: 10.1016/j.compbiomed.2024.108635. Epub 2024 May 22. Comput Biol Med. 2024. PMID: 38796881 Review.
References
LinkOut - more resources
Full Text Sources
Miscellaneous