Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec;6(12):1399-1406.
doi: 10.1038/s41551-022-00936-9. Epub 2022 Sep 15.

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Affiliations

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Ekin Tiu et al. Nat Biomed Eng. 2022 Dec.

Abstract

In tasks involving the interpretation of medical images, suitably trained machine-learning models often exceed the performance of medical experts. Yet such a high-level of performance typically requires that the models be trained with relevant datasets that have been painstakingly annotated by experts. Here we show that a self-supervised model trained on chest X-ray images that lack explicit annotations performs pathology-classification tasks with accuracies comparable to those of radiologists. On an external validation dataset of chest X-rays, the self-supervised model outperformed a fully supervised model in the detection of three pathologies (out of eight), and the performance generalized to pathologies that were not explicitly annotated for model training, to multiple image-interpretation tasks and to datasets from multiple institutions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The self-supervised model classifies pathologies without training on any labelled samples.
a, Training pipeline. The model learns features from raw radiology reports, which act as a natural source of supervision. b, Prediction of pathologies in a chest X-ray image. For each pathology, we generated a positive and negative prompt (such as ‘consolidation’ versus ‘no consolidation’). By comparing the model output for the positive and negative prompts, the self-supervised method computes a probability score for the pathology, and this can be used to classify its presence in the chest X-ray image.
Fig. 2
Fig. 2. Comparisons of MCC and F1 scores and of ROC curves, for the self-supervised model and board-certified radiologists.
a, F1 scores of the self-supervised model as compared with three board-certified radiologists on the CheXpert test dataset for the five CheXpert competition conditions. The model’s F1 score is significantly higher than that of radiologists on pleural effusion, significantly lower on atelectasis and not statistically significantly different on cardiomegaly, consolidation and oedema. b, Comparison of the MCC of the self-supervised model against three board-certified radiologists on the CheXpert test dataset. The MCC of the model is not significantly different than that of radiologists on all five pathologies. a,b, Green plots indicate the performance of the three board-certified radiologists while blue plots indicate the performance of the self-supervised model. c, Comparison of the ROC curve of the self-supervised model to benchmark radiologists against the test-set ground truth. The model outperforms the radiologists when the ROC curve lies above the radiologists’ operating points. The dotted lines on the ROC curves represent the baseline performance of a classifier that is no better than random guessing. Source data
Fig. 3
Fig. 3. Performance on unseen radiographic findings in the PadChest dataset.
Mean AUC and 95% CI are shown for each radiographic finding (n > 50) labelled as high importance by an expert radiologist. We externally validated the model’s ability to generalize to different data distributions by evaluating model performance on the human-annotated subset of the PadChest dataset (n = 39,053 chest X-rays). No labelled samples were seen during training for any of the radiographic findings in this dataset. The self-supervised method achieves an AUC of at least 0.900 on 6 findings and at least 0.700 on 38 findings out of 57 radiographic findings where n > 50 in the PadChest test dataset (n = 39,053). Source data

References

    1. Rajpurkar, P., et al. 2017. CheXNet: radiologist-level pneumonia detection on chest X-Rays with deep learning. arXiv10.48550/arXiv.1711.05225 (2017).
    1. Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal.42, 60–88 (2017). - PubMed
    1. Qin, C., Yao, D., Shi, Y. & Song, Z. Computer-aided detection in chest radiography based on artificial intelligence: a survey. Biomedical engineering online17, 1–23 (2018). - PMC - PubMed
    1. Esteva, A. et al. Deep learning-enabled medical computer vision. NPJ Digit. Med. 10.1038/s41746-020-00376-2 (2021). - PMC - PubMed
    1. Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng.19, 221–248 (2017). - PMC - PubMed