Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 20:10:20552076241305168.
doi: 10.1177/20552076241305168. eCollection 2024 Jan-Dec.

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Affiliations

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Jing Li et al. Digit Health. .

Abstract

Objective: While current multimodal approaches in the diagnosis and severity assessment of pneumonia demonstrate remarkable performance, they frequently overlook the issue of modality absence-a common challenge in clinical practice. Thus, we present the robust multimodal transformer (RMT) model, crafted to bridge this gap. The RMT model aims to enhance diagnosis and severity assessment accuracy in situations with incomplete data, thereby ensuring it meets the complex needs of real-world clinical settings.

Method: The RMT model leverages multimodal data, integrating X-ray images and clinical text data through a sophisticated AI-driven framework. It employs a Transformer-based architecture, enhanced by multi-task learning and mask attention mechanism. This approach aims to optimize the model's performance across different modalities, particularly under conditions of modality absence.

Results: The RMT model demonstrates superior performance over traditional diagnostic methods and baseline models in accuracy, precision, sensitivity, and specificity. In tests involving various scenarios, including single-modal and multimodal tasks, the model shows remarkable robustness in handling incomplete data. Its effectiveness is further validated through extensive comparative analysis and ablation studies.

Conclusion: The RMT model represents a substantial advancement in pediatric pneumonia severity assessment. It successfully harnesses multimodal data and advanced AI techniques to improve assessment precision. While the RMT model sets a new precedent in AI applications in medical diagnostics, the development of a comprehensive pediatric pneumonia dataset marks a pivotal contribution, providing a robust foundation for future research.

Keywords: Clinical data; chest X-ray image; deep learning; multimodal transformers; pediatric pneumonia.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Data selection process for pediatric pneumonia cases. This flowchart illustrates the reduction of an initial set of 37,579 records to 5101 high-quality cases, balanced between 2537 mild and 2564 severe cases, after excluding non-pneumonia respiratory cases and those with unclear severity annotations.
Figure 2.
Figure 2.
Composite reapresentation of image processing stages for a chest X-ray. Panel (a) displays the original grayscale image. Panel (b) exhibits the enhanced clarity post histogram equalization, where the contrast adjustments elucidate finer details for diagnostic purposes. Panel (c) demonstrates a series of data augmentation techniques applied to (b).
Figure 3.
Figure 3.
Age and gender distribution of pediatric patients with mild and severe pneumonia. (a) The age and gender distribution for mild pneumonia cases; (b) the age and gender distribution for severe pneumonia cases.
Figure 4.
Figure 4.
Overview of the robust multimodal transformer (RMT) model architecture.
Figure 5.
Figure 5.
Visualization of the masked attention technique in the RMT model: (a) Masked attention matrix for image-only task. The matrix displays selective activation where the image CLS token (“ I ”) receives attention from image vectors (“ i1 ,” “ i2 ,” and “ i3 ”) while attention from text vectors (“ t1 ,” “ t2 ,” and “ t3 ”) and multimodal vector “ IT ” is masked, showcasing the model’s focused analysis on visual data; (b) masked attention matrix for text-only task. The matrix illustrates the attention focus on the text CLS token (“ T ”) with attention from image vectors (“ i1 ,” “ i2 ,” and “ i3 ”) and multimodal vector “ IT ” being masked, highlighting the model’s capacity for concentrated textual information processing. RMT: robust multimodal transformer; CLS: classification token.
Figure 6.
Figure 6.
Composite overview of the robust multimodal transformer (RMT) model’s performance. (a) Model accuracy over training epochs; (b) model loss over training epochs.
Figure 7.
Figure 7.
Comparative analysis of the RMT and MMBT models in different modality scenarios: (a) performance under varying rates of text data availability; (b) performance under varying rates of image data availability. RMT: robust multimodal transformer; MMBT: supervised multimodal bi-transformer.
Figure 8.
Figure 8.
Original X-ray alongside the gradient-weighted class activation mapping (Grad-CAM) heatmap. The red and yellow regions indicate where the model focuses its attention, providing an interpretable map of the regions that drive the diagnosis.
Figure 9.
Figure 9.
Attention maps at three key stages of the model’s information aggregation process. From left to right: stage 1 represents self-attention focused on individual tokens, stage 2 shows a broadening of attention across neighboring tokens, and stage 3 highlights the final aggregation of global information where key tokens such as the textual modality are prioritized.

Similar articles

References

    1. Marangu D, Zar HJ. Childhood pneumonia in low-and-middle-income countries: an update. Paediatr Respir Rev 2019; 32: 3–9. - PMC - PubMed
    1. Zar HJ, Madhi SA, Aston SJ, et al.. Pneumonia in low and middle income countries: progress and challenges. Thorax 2013; 68: 1052–1056. - PMC - PubMed
    1. McAllister DA, Liu L, Shi T, et al.. Global, regional, and national estimates of pneumonia morbidity and mortality in children younger than 5 years between 2000 and 2015: a systematic analysis. The Lancet Global Health 2019; 7: e47–e57. - PMC - PubMed
    1. Dean P, Florin TA. Factors associated with pneumonia severity in children: a systematic review. J Pediatric Infect Dis Soc 2018; 7: 323–334. - PMC - PubMed
    1. Ambroggio L, Brokamp C, Mantyla R, et al.. Validation of the British Thoracic Society severity criteria for pediatric community-acquired pneumonia. Pediatr Infect Dis J 2019; 38: 894–899. - PMC - PubMed

LinkOut - more resources