Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray

Fangyun Li^#¹, Lingxiao Zhou^#², Yunpeng Wang^#¹, Chuan Chen³, Shuyi Yang^{4

5}, Fei Shan³, Lei Liu^{1

6}

Affiliations

¹ Institute of Biomedical Sciences, Fudan University, Shanghai, China.
² Institute of Microscale Optoelectronics, Shenzhen University, Shenzhen, China.
³ Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
⁴ Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, China.
⁵ Shanghai Institute of Medical Imaging, Shanghai, China.
⁶ School of Basic Medical Sciences, Fudan University, Shanghai, China.

^# Contributed equally.

PMID: 35655823
PMCID: PMC9131331
DOI: 10.21037/qims-21-1117

Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray

Fangyun Li et al. Quant Imaging Med Surg. 2022 Jun.

. 2022 Jun;12(6):3364-3378.

doi: 10.21037/qims-21-1117.

Authors

Fangyun Li^#¹, Lingxiao Zhou^#², Yunpeng Wang^#¹, Chuan Chen³, Shuyi Yang^{4

5}, Fei Shan³, Lei Liu^{1

6}

Affiliations

¹ Institute of Biomedical Sciences, Fudan University, Shanghai, China.
² Institute of Microscale Optoelectronics, Shenzhen University, Shenzhen, China.
³ Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
⁴ Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, China.
⁵ Shanghai Institute of Medical Imaging, Shanghai, China.
⁶ School of Basic Medical Sciences, Fudan University, Shanghai, China.

^# Contributed equally.

PMID: 35655823
PMCID: PMC9131331
DOI: 10.21037/qims-21-1117

Abstract

Background: Computer-aided diagnosis based on chest X-ray (CXR) is an exponentially growing field of research owing to the development of deep learning, especially convolutional neural networks (CNNs). However, due to the intrinsic locality of convolution operations, CNNs cannot model long-range dependencies. Although vision transformers (ViTs) have recently been proposed to alleviate this limitation, those trained on patches cannot learn any dependencies for inter-patch pixels and thus, are insufficient for medical image detection. To address this problem, in this paper, we propose a CXR detection method which integrates CNN with a ViT for modeling patch-wise and inter-patch dependencies.

Methods: We experimented on the ChestX-ray14 dataset and followed the official training-test set split. Because the training data only had global annotations, the detection network was weakly supervised. A DenseNet with a feature pyramid structure was designed and integrated with an adaptive ViT to model inter-patch and patch-wise long-range dependencies and obtain fine-grained feature maps. We compared the performance using our method with that of other disease detection methods.

Results: For disease classification, our method achieved the best result among all the disease detection methods, with a mean area under the curve (AUC) of 0.829. For lesion localization, our method achieved significantly higher intersection of the union (IoU) scores on the test images with bounding box annotations than did the other detection methods. The visualized results showed that our predictions were more accurate and detailed. Furthermore, evaluation of our method in an external validation dataset demonstrated its generalization ability.

Conclusions: Our proposed method achieves the new state of the art for thoracic disease classification and weakly supervised localization. It has potential to assist in clinical decision-making.

Keywords: Long-range dependencies; chest X-rays (CXRs); disease classification; localization; vision transformer (ViT).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-21-1117/coif). FL has a pending patent (No. 202110640995.8). The other authors have no conflicts of interest to declare.

Figures

**Figure 1**
Visual localization results of a CXR from a case diagnosed with “effusion”. The bounding boxes in green represent the published ground truth, and the red represents the predicted results of our proposed method. (A) The lesion localized by DenseNet121 (a CNN-based method). (B) The lesion localized by our proposed method. CXR, chest X-ray; CNN, convolutional neural network.

**Figure 2**
An overview of our proposed method. Our model comprises two branches and a fusion module. The classification score was acquired from the LSE layer, and the localization was obtained from the saliency maps before the LSE layer. ViT, vision transformer; CNN, convolutional neural network; LSE, Log-Sum-Exp.

**Figure 3**
The architecture of the pyramid DenseNet. The network receives the input image and passes it via the main structure, which consists of four dense blocks interspersed with three transition blocks. After being generated by the last dense block, low-resolution feature maps were upsampled and fused with the outputs of the previous third dense block to obtain the fine resolution feature maps.

**Figure 4**
The architecture of the adaptive ViT. Norm, normalization; MLP, multilayer perceptron; ViT, vision transformer.

**Figure 5**
Several visualized localization results of Wang *et al*. (8), Zhou *et al.* (32), and our proposed method. The bounding boxes in green represent the published ground truth, and the red represents the predicted results using different methods.

**Figure 6**
Examples of visual localization outcomes for eight diseases in the ChestX-ray14 dataset. The images on the left are the input CXRs and are labeled with two bounding boxes: the green shows the published ground truth, and the red is the predicted result obtained by a simple threshold method on the class-specific saliency maps, which are shown on the right. The IoU results calculated by ground truth and predicted bounding boxes are shown on the left. CXR, chest X-ray; IoU, intersection of the union.

**Figure 7**
Several qualitative results of different feature encoder backbones in the ChestX-ray14 dataset: DenseNet121, ViT, and our network (DenseNet121+ViT). The bounding boxes in green represent the published ground truth, and the red represents the predicted results. ViT, vision transformer; DenseNet121+ViT, our proposed method which includes two network branches, DenseNet121 and ViT.

See this image and copyright information in PMC

References

1. Körner M, Weber CH, Wirth S, Pfeifer KJ, Reiser MF, Treitl M. Advances in digital radiography: physical principles and system overview. Radiographics 2007;27:675-86. 10.1148/rg.273065075 - DOI - PubMed
1. Hui TCH, Khoo HW, Young BE, Haja Mohideen SM, Lee YS, Lim CJ, Leo YS, Kaw GJL, Lye DC, Tan CH. Clinical utility of chest radiography for severe COVID-19. Quant Imaging Med Surg 2020;10:1540-50. 10.21037/qims-20-642 - DOI - PMC - PubMed
1. Yan C, Yao J, Li R, Xu Z, Huang J. Weakly Supervised Deep Learning for Thoracic Disease Classification and Localization on Chest X-rays. Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics; Washington, DC, USA: Association for Computing Machinery; 2018:103-4.
1. Kim HG, Lee KM, Kim EJ, Lee JS. Improvement diagnostic accuracy of sinusitis recognition in paranasal sinus X-ray using multiple deep learning models. Quant Imaging Med Surg 2019;9:942-51. 10.21037/qims.2019.05.15 - DOI - PMC - PubMed
1. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60-88. 10.1016/j.media.2017.07.005 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray

Affiliations

Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources