Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 21;11(1):e041139.
doi: 10.1136/bmjopen-2020-041139.

Investigating the use of a two-stage attention-aware convolutional neural network for the automated diagnosis of otitis media from tympanic membrane images: a prediction model development and validation study

Affiliations

Investigating the use of a two-stage attention-aware convolutional neural network for the automated diagnosis of otitis media from tympanic membrane images: a prediction model development and validation study

Yuexin Cai et al. BMJ Open. .

Abstract

Objectives: This study investigated the usefulness and performance of a two-stage attention-aware convolutional neural network (CNN) for the automated diagnosis of otitis media from tympanic membrane (TM) images.

Design: A classification model development and validation study in ears with otitis media based on otoscopic TM images. Two commonly used CNNs were trained and evaluated on the dataset. On the basis of a Class Activation Map (CAM), a two-stage classification pipeline was developed to improve accuracy and reliability, and simulate an expert reading the TM images.

Setting and participants: This is a retrospective study using otoendoscopic images obtained from the Department of Otorhinolaryngology in China. A dataset was generated with 6066 otoscopic images from 2022 participants comprising four kinds of TM images, that is, normal eardrum, otitis media with effusion (OME) and two stages of chronic suppurative otitis media (CSOM).

Results: The proposed method achieved an overall accuracy of 93.4% using ResNet50 as the backbone network in a threefold cross-validation. The F1 Score of classification for normal images was 94.3%, and 96.8% for OME. There was a small difference between the active and inactive status of CSOM, achieving 91.7% and 82.4% F1 scores, respectively. The results demonstrate a classification performance equivalent to the diagnosis level of an associate professor in otolaryngology.

Conclusions: CNNs provide a useful and effective tool for the automated classification of TM images. In addition, having a weakly supervised method such as CAM can help the network focus on discriminative parts of the image and improve performance with a relatively small database. This two-stage method is beneficial to improve the accuracy of diagnosis of otitis media for junior otolaryngologists and physicians in other disciplines.

Keywords: adult otolaryngology; endoscopic surgery; otolaryngology; paediatric otolaryngology.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Classification tree for the four diagnostic classes. A total of 6066 images are included and categorised to four conditions: 1040 images of normal tympanic membrane, 2613 images of otitis media effusion, 1662 images of chronic suppurative otitis media in active phase and 751 images of chronic suppurative otitis media in inactive phase.
Figure 2
Figure 2
Every patch in the same image. The sliding-window strategy for patch selection is to take a window to scan throughout the whole image by a fixed step size, (eg, 16 pixels in both row and column directions). At each location, a patch can be cropped from the image and a score assigned to this patch according to the binarised heat map. The score of this patch is calculated by averaging the intensity values of pixels within the corresponding window in the heat map. Several different window sizes are predefined. Then, the image patches cropped at various sizes and locations (eg, the red and green boxes are ordered according to their scores, and the top ones with high scores are selected, eg, the green box). CAM, Class Activation Map.
Figure 3
Figure 3
The complete classification pipeline of the method. The main classifier provides a global result and the focal model works on patches containing discriminative and local features. The prediction results of the two models are merged by averaging the classification output scores obtained from these two models. CAMs, Class Activation Maps.
Figure 4
Figure 4
Confusion matrices for each stage in our method with ResNet50 and three human experts. The row axis indicates the prediction while the column axis represents for the ground truth. Results among test sets of three folds are combined to report a performance of the whole dataset. (A, B) show result of the two major classifiers of our method, while (C) reports the result of average assembling. In addition, the overall accuracies of these three experts are 79.07%, 86.57% and 91.02% (D, E, F). CSOMa, chronic suppurative otitis media in active phase; CSOMi, chronic suppurative otitis media in inactive phase; OME, otitis media with effusion.
Figure 5
Figure 5
Receiver operating characteristic curve for classification of two challenging situations, comparing the method in ResNet50 with human experts. The red curve is the average of three folds’ performance and the other curves show the result for each fold. Our method can achieve a performance similar with the associate chief doctor. AUC, area under the curve; CSOMa, chronic suppurative otitis media in active phase; CSOMi, chronic suppurative otitis media in inactive phase; OME, otitis media with effusion.
Figure 6
Figure 6
Typical samples of each situation, including normal, OME, CSOMa and CSOMi. From left to right, each column shows original image, the CAM of normal, the CAM of OME, the CAM of CSOMa, the CAM of CSOMi, the averages of CAM, the selected box and the patch for the focal classifier. CAM, Class Activation Map; CSOMa, chronic suppurative otitis media in active phase; CSOMi, chronic suppurative otitis media in inactive phase; OME, otitis media with effusion.

References

    1. Lee JY, Choi S-H, Chung JW. Automated classification of the tympanic membrane using a Convolutional neural network. Appl Sci 2019;9:1827 10.3390/app9091827 - DOI
    1. Schilder AGM, Chonmaitree T, Cripps AW, et al. Otitis media. Nat Rev Dis Primers 2016;2:16063. 10.1038/nrdp.2016.63 - DOI - PMC - PubMed
    1. Myburgh HC, van Zijl WH, Swanepoel D, et al. Otitis media diagnosis for developing countries using tympanic membrane Image-Analysis. EBioMedicine 2016;5:156–60. 10.1016/j.ebiom.2016.02.017 - DOI - PMC - PubMed
    1. Guan Q, Huang Y, Zhong Z. Diagnose like a radiologist: attention guided Convolutional neural network for thorax disease classification. arXiv 2018. doi:arXiv:1801.09927
    1. Liskowski P, Krawiec K. Segmenting retinal blood vessels with deep neural networks. IEEE Trans Med Imaging 2016;35:2369–80. 10.1109/TMI.2016.2546227 - DOI - PubMed

Publication types

LinkOut - more resources