Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 28;7(1):283.
doi: 10.1038/s41597-020-00622-y.

HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy

Affiliations

HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy

Hanna Borgli et al. Sci Data. .

Abstract

Artificial intelligence is currently a hot topic in medicine. However, medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel for the cumbersome and tedious process to manually label training data. These constraints make it difficult to develop systems for automatic analysis, like detecting disease or other lesions. In this respect, this article presents HyperKvasir, the largest image and video dataset of the gastrointestinal tract available today. The data is collected during real gastro- and colonoscopy examinations at Bærum Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists. The dataset contains 110,079 images and 374 videos, and represents anatomical landmarks as well as pathological and normal findings. The total number of images and video frames together is around 1 million. Initial experiments demonstrate the potential benefits of artificial intelligence-based computer-assisted diagnosis systems. The HyperKvasir dataset can play a valuable role in developing better algorithms and computer-assisted examination systems not only for gastro- and colonoscopy, but also for other fields in medicine.

PubMed Disclaimer

Conflict of interest statement

Authors P.H.S., D.J., C.G., M.A.R., P.H. and T.d.L. all own shares in the Augere Medical AS company developing AI solutions for colonoscopies. The Augere video annotation system was used to label the videos. There is no commercial interest from Augere regarding this publication and dataset. Otherwise, the authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Image examples of the various labeled classes for images and/or videos.
Fig. 2
Fig. 2
Resolution of the 110,079 images in HyperKvasir.
Fig. 3
Fig. 3
Statistics of the 374 videos in HyperKvasir.
Fig. 4
Fig. 4
The number of images in the various HyperKvasir labeled image classes according to the file folders.
Fig. 5
Fig. 5
The various image classes structured under position and type, also the structure of the stored images.
Fig. 6
Fig. 6
The number of videos in the various HyperKvasir labeled video classes according to the file folders.
Fig. 7
Fig. 7
The various video classes structured under position and type, which is also the structure of the video folders.
Fig. 8
Fig. 8
Confusion matrices for Averaged ResNet-152 + DenseNet-161 and Pre-Trained DenseNet-161 including both splits. These confusion matrices were selected based on their performance. Averaged ResNet-152 + DenseNet-161 achieved the best micro-averaged results while the Pre-Trained DenseNet-161 achieved the best macro-averaged result. The color codes represent the percentages of the total number of images within each class. The labeling of the classes is as follows: (A) Barrett’s; (B) bbps-0-1; (C) bbps-2-3; (D) dyed lifted polyps; (E) dyed resection margins; (F) hemorrhoids; (G) ileum; (H) impacted stool; (I) normal cecum; (J) normal pylorus; (K) normal Z-line; (L) oesophagitis-a; (M) oesophagitis-b-d; (N) polyp; (O) retroflex rectum; (P) retroflex stomach; (Q) short segment Barrett’s; (R) ulcerative colitis grade 0-1; (S) ulcerative colitis grade 1-2; (T) ulcerative colitis grade 2-3; (U) ulcerative colitis grade 1; (V) ulcerative colitis grade 2; (W) ulcerative colitis grade 3.
Fig. 9
Fig. 9
Unlabeled image data predictions for Averaged ResNet-152 + DenseNet-161 and Pre-Trained DenseNet-161.

References

    1. Brenner H, Kloor M, Pox CP. Colorectal cancer. The Lancet. 2014;383:1490–502. doi: 10.1016/S0140-6736(13)61649-9. - DOI - PubMed
    1. Torre LA, et al. Global cancer statistics, 2012. CA: A Cancer J. for Clin. 2015;65:87–108. doi: 10.1056/NEJMoa0907667. - DOI - PubMed
    1. World Health Organization - International Agency for Research on Cancer. Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012 (2012).
    1. Hewett DG, Kahi CJ, Rex DK. Efficacyandeffectivenessofcolonoscopy: howdowebridgethegap? Gastrointest. Endosc. Clin. 2010;20:673–684. doi: 10.1016/j.giec.2010.07.011. - DOI - PubMed
    1. Lee SH, et al. Endoscopic experience improves interobserver agreement in the grading of esophagitis by los angeles classification: conventional endoscopy and optimal band image system. Gut liver. 2014;8:154. doi: 10.5009/gnl.2014.8.2.154. - DOI - PMC - PubMed

Publication types

MeSH terms