. 2023 Feb 7;23(1):28.

doi: 10.1186/s12911-023-02121-7.

Deep learning approach to detection of colonoscopic information from unstructured reports

Donghyeong Seong¹, Yoon Ho Choi², Soo-Yong Shin^{2

3}, Byoung-Kee Yi⁴

Affiliations

¹ Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355, Republic of Korea.
² Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355, Republic of Korea.
³ Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351, Republic of Korea.
⁴ Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea. byoungkeeyi@gmail.com.

PMID: 36750932
PMCID: PMC9903463
DOI: 10.1186/s12911-023-02121-7

Deep learning approach to detection of colonoscopic information from unstructured reports

Donghyeong Seong et al. BMC Med Inform Decis Mak. 2023.

. 2023 Feb 7;23(1):28.

doi: 10.1186/s12911-023-02121-7.

Authors

Donghyeong Seong¹, Yoon Ho Choi², Soo-Yong Shin^{2

3}, Byoung-Kee Yi⁴

Affiliations

¹ Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355, Republic of Korea.
² Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355, Republic of Korea.
³ Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351, Republic of Korea.
⁴ Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea. byoungkeeyi@gmail.com.

PMID: 36750932
PMCID: PMC9903463
DOI: 10.1186/s12911-023-02121-7

Abstract

Background: Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information.

Methods: This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model.

Results: The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers.

Conclusions: This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.

Keywords: Colonoscopy; Data processing; Deep learning; Information extraction; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
The architecture of bidirectional LSTM-CRF and BioBERT with pre-trained word embedding using unannotated data

**Fig. 2**
Three experiments performed in this study

**Fig. 3**
Performance of pre-trained word embedding

**Fig. 4**
Comparison by the amount of data

See this image and copyright information in PMC

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–249. - PubMed
1. Kang MJ, Won Y-J, Lee JJ, Jung K-W, Kim H-J, Kong H-J, Im J-S, Seo HG. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2019. Cancer Res Treat. 2022;54(2):330–344. - PMC - PubMed
1. Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly LF, Anderson JC, Cercek A, Smith RA, Jemal A. Colorectal cancer statistics, 2020. CA Cancer J Clin. 2020;70(3):145–164. - PubMed
1. US Preventive Services Task Force Screening for colorectal cancer: us preventive services task force recommendation statement. JAMA. 2021;325(19):1965–1977. - PubMed
1. Korea National Cancer Center. National Cancer Control Programs. https://www.ncc.re.kr/main.ncc?uri=english/sub04_ControlPrograms. Accessed 20 Jan 2023.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HI19C1328/Ministry of Health and Welfare

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning approach to detection of colonoscopic information from unstructured reports

Affiliations

Deep learning approach to detection of colonoscopic information from unstructured reports

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical