. 2024 Dec 20:10:20552076241305168.

doi: 10.1177/20552076241305168. eCollection 2024 Jan-Dec.

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Jing Li^{1

2

3}, Ziang Nan⁴, Guoqiang Qi^{1

2

3}, Junlan Cai⁵, Xinkui Zhao⁶, Xiang Li⁵, Shaofeng Liu⁵, Yuqi Wang^{3

7}, Yangyang Wu⁶, Xiaoye Miao^{4

8}, Gang Yu^{1

2

3}

Affiliations

¹ Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
² Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, China.
³ Pediatric Medicine Engineering and Information Research Center, National Clinical Research Center for Child Health, China.
⁴ Center for Data Science, Zhejiang University, China.
⁵ Beijing Life Science Academy, Beijing, China.
⁶ The School of Software Technology, Zhejiang University, China.
⁷ Department of Pulmonology, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
⁸ The State Key Lab of Brain-Machine Intelligence, Zhejiang University, China.

PMID: 39711742
PMCID: PMC11660274
DOI: 10.1177/20552076241305168

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Jing Li et al. Digit Health. 2024.

. 2024 Dec 20:10:20552076241305168.

doi: 10.1177/20552076241305168. eCollection 2024 Jan-Dec.

Authors

Jing Li^{1

2

3}, Ziang Nan⁴, Guoqiang Qi^{1

2

3}, Junlan Cai⁵, Xinkui Zhao⁶, Xiang Li⁵, Shaofeng Liu⁵, Yuqi Wang^{3

7}, Yangyang Wu⁶, Xiaoye Miao^{4

8}, Gang Yu^{1

2

3}

Affiliations

¹ Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
² Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, China.
³ Pediatric Medicine Engineering and Information Research Center, National Clinical Research Center for Child Health, China.
⁴ Center for Data Science, Zhejiang University, China.
⁵ Beijing Life Science Academy, Beijing, China.
⁶ The School of Software Technology, Zhejiang University, China.
⁷ Department of Pulmonology, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
⁸ The State Key Lab of Brain-Machine Intelligence, Zhejiang University, China.

PMID: 39711742
PMCID: PMC11660274
DOI: 10.1177/20552076241305168

Abstract

Objective: While current multimodal approaches in the diagnosis and severity assessment of pneumonia demonstrate remarkable performance, they frequently overlook the issue of modality absence-a common challenge in clinical practice. Thus, we present the robust multimodal transformer (RMT) model, crafted to bridge this gap. The RMT model aims to enhance diagnosis and severity assessment accuracy in situations with incomplete data, thereby ensuring it meets the complex needs of real-world clinical settings.

Method: The RMT model leverages multimodal data, integrating X-ray images and clinical text data through a sophisticated AI-driven framework. It employs a Transformer-based architecture, enhanced by multi-task learning and mask attention mechanism. This approach aims to optimize the model's performance across different modalities, particularly under conditions of modality absence.

Results: The RMT model demonstrates superior performance over traditional diagnostic methods and baseline models in accuracy, precision, sensitivity, and specificity. In tests involving various scenarios, including single-modal and multimodal tasks, the model shows remarkable robustness in handling incomplete data. Its effectiveness is further validated through extensive comparative analysis and ablation studies.

Conclusion: The RMT model represents a substantial advancement in pediatric pneumonia severity assessment. It successfully harnesses multimodal data and advanced AI techniques to improve assessment precision. While the RMT model sets a new precedent in AI applications in medical diagnostics, the development of a comprehensive pediatric pneumonia dataset marks a pivotal contribution, providing a robust foundation for future research.

Keywords: Clinical data; chest X-ray image; deep learning; multimodal transformers; pediatric pneumonia.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Data selection process for pediatric pneumonia cases. This flowchart illustrates the reduction of an initial set of 37,579 records to 5101 high-quality cases, balanced between 2537 mild and 2564 severe cases, after excluding non-pneumonia respiratory cases and those with unclear severity annotations.

**Figure 2.**
Composite reapresentation of image processing stages for a chest X-ray. Panel (a) displays the original grayscale image. Panel (b) exhibits the enhanced clarity post histogram equalization, where the contrast adjustments elucidate finer details for diagnostic purposes. Panel (c) demonstrates a series of data augmentation techniques applied to (b).

**Figure 3.**
Age and gender distribution of pediatric patients with mild and severe pneumonia. (a) The age and gender distribution for mild pneumonia cases; (b) the age and gender distribution for severe pneumonia cases.

**Figure 4.**
Overview of the robust multimodal transformer (RMT) model architecture.

**Figure 5.**
Visualization of the masked attention technique in the RMT model: (a) Masked attention matrix for image-only task. The matrix displays selective activation where the image CLS token (“ $I$ ”) receives attention from image vectors (“ $i_{1}$ ,” “ $i_{2}$ ,” and “ $i_{3}$ ”) while attention from text vectors (“ $t_{1}$ ,” “ $t_{2}$ ,” and “ $t_{3}$ ”) and multimodal vector “ $I - T$ ” is masked, showcasing the model’s focused analysis on visual data; (b) masked attention matrix for text-only task. The matrix illustrates the attention focus on the text CLS token (“ $T$ ”) with attention from image vectors (“ $i_{1}$ ,” “ $i_{2}$ ,” and “ $i_{3}$ ”) and multimodal vector “ $I - T$ ” being masked, highlighting the model’s capacity for concentrated textual information processing. RMT: robust multimodal transformer; CLS: classification token.

**Figure 6.**
Composite overview of the robust multimodal transformer (RMT) model’s performance. (a) Model accuracy over training epochs; (b) model loss over training epochs.

**Figure 7.**
Comparative analysis of the RMT and MMBT models in different modality scenarios: (a) performance under varying rates of text data availability; (b) performance under varying rates of image data availability. RMT: robust multimodal transformer; MMBT: supervised multimodal bi-transformer.

**Figure 8.**
Original X-ray alongside the gradient-weighted class activation mapping (Grad-CAM) heatmap. The red and yellow regions indicate where the model focuses its attention, providing an interpretable map of the regions that drive the diagnosis.

**Figure 9.**
Attention maps at three key stages of the model’s information aggregation process. From left to right: stage 1 represents self-attention focused on individual tokens, stage 2 shows a broadening of attention across neighboring tokens, and stage 3 highlights the final aggregation of global information where key tokens such as the textual modality are prioritized.

See this image and copyright information in PMC

References

1. Marangu D, Zar HJ. Childhood pneumonia in low-and-middle-income countries: an update. Paediatr Respir Rev 2019; 32: 3–9. - PMC - PubMed
1. Zar HJ, Madhi SA, Aston SJ, et al.. Pneumonia in low and middle income countries: progress and challenges. Thorax 2013; 68: 1052–1056. - PMC - PubMed
1. McAllister DA, Liu L, Shi T, et al.. Global, regional, and national estimates of pneumonia morbidity and mortality in children younger than 5 years between 2000 and 2015: a systematic analysis. The Lancet Global Health 2019; 7: e47–e57. - PMC - PubMed
1. Dean P, Florin TA. Factors associated with pneumonia severity in children: a systematic review. J Pediatric Infect Dis Soc 2018; 7: 323–334. - PMC - PubMed
1. Ambroggio L, Brokamp C, Mantyla R, et al.. Validation of the British Thoracic Society severity criteria for pediatric community-acquired pneumonia. Pediatr Infect Dis J 2019; 38: 894–899. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Affiliations

Assessing severity of pediatric pneumonia using multimodal transformers with multi-task learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous