Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb:2022:162-169.
doi: 10.5220/0010876100003123.

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation

Affiliations

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation

Shorabuddin Syed et al. Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb.

Abstract

Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.

Keywords: Annotation; Clinical Corpus; Colonoscopy; Machine Learning; Natural Language Processing; Taxonomy.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Annotation workflow to label colonoscopy related documents. The process is divided into pre-annotation and interactive-annotation stage. In the pre-annotation stage clinical entities were identified, taxonomies created, annotators recruited, and annotation guidelines and tools deployed. In the interactive-annotation stage, documents were double annotated by two teams and differences was adjudicated by a domain expert.
Figure 2:
Figure 2:
Colonoscopy taxonomy depicting clinical entities and their classifications. Colonoscopy reports were annotated for entities mentioned in the taxonomy.
Figure 3:
Figure 3:
Pathology taxonomy depicting clinical entities and their classifications. Pathology reports were annotated for entities mentioned in the taxonomy.
Figure 4:
Figure 4:
Radiology imaging taxonomy depicting clinical entities and their classifications. Radiology reports were annotated for entities mentioned in the taxonomy.
Figure 5:
Figure 5:
Workflow depicting development and refinement of annotation guidelines through a rigorous and iterative process.

References

    1. Anderson JC, & Butterly LF (2015). Colonoscopy: quality indicators. Clinical and translational gastroenterology, 6(2), e77–e77. doi: 10.1038/ctg.2015.5 - DOI - PMC - PubMed
    1. Brahmania M, Park J, Svarta S, Tong J, Kwok R, & Enns R (2012). Incomplete colonoscopy: maximizing completion rates of gastroenterologists. Canadian journal of gastroenterology = Journal canadien de gastroenterologie, 26(9), 589–592. doi: 10.1155/2012/353457 - DOI - PMC - PubMed
    1. Fan Y, Wen A, Shen F, Sohn S, Liu H, & Wang L (2019). Evaluating the Impact of Dictionary Updates on Automatic Annotations Based on Clinical NLP Systems. AMIA Jt Summits Transl Sci Proc, 2019, 714–721. - PMC - PubMed
    1. Griffis D, Shivade C, Fosler-Lussier E, & Lai AM (2016). A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain. AMIA Jt Summits Transl Sci Proc, 2016, 88–97. - PMC - PubMed
    1. Henry S, Buchan K, Filannino M, Stubbs A, & Uzuner O (2020). 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc, 27(1), 3–12. doi: 10.1093/jamia/ocz166 - DOI - PMC - PubMed

LinkOut - more resources