Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 6;18(1):61.
doi: 10.1186/s12911-018-0645-3.

Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

Affiliations

Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

Yiqing Zhao et al. BMC Med Inform Decis Mak. .

Abstract

Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline.

Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org , to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the "Subject:Relationship:Object" syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources.

Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%.

Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain.

Keywords: Big data analysis; Information extraction; Knowledge modeling; Medical imaging; Natural language processing; Semantic network; Sublanguage analysis; Text mining.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The author(s) declare(s) that this study does not involve human subjects or personal identifiable information. So this study does not need ethics approval or consent from individuals.

Consent for publication

The author(s) declare(s) that the manuscript does not contain any individual person’s data. So this paper requires no consent to publish.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
System pipeline: (1) Corpus development (using Jsoup), (2) Syntactic processing (using Stanford Parser), (3) Semantic processing (using UMLS Annotator), (4) Knowledge model generation
Fig. 2
Fig. 2
Co-occurrence network of top 40 semantic types (subgraph). The thickness of the edge demonstrates weight (the number of co-occurrence incidences); a thicker edge means more co-occurrence incidences exist in the relation. The size of the nodes indicates connectivity (the number of other nodes connected to it). The network graph represents the complexity of the semantic co-occurrence pattern of semantic types in imaging notes
Fig. 3
Fig. 3
Summary of different semantic types (among 289,782 NP and ADJP, top 22). Majority (80.32%) of the radiology case corpus covered by the top 22 (16.3%) UMLS semantic types
Fig. 4
Fig. 4
Knowledge model. The dotted lines show significant relationships in the co-occurrence network. The dotted box represents core semantic categories that are intrinsically closely related and are significant in the knowledge model
Fig. 5
Fig. 5
Knowledge model example of two sentences: “Serial IVU films showing widely separated pubic bones with absent symphysis” and “Complex L-transposition of the great arteries with cardiac pacemaker”

References

    1. Weng C, Wu X, Luo Z, Boland MR, Theodoratos D, Johnson SB. EliXR: an approach to eligibility criteria extraction and representation. J Am Med Inform Assoc. 2011;18(Supplement 1):i116–i124. doi: 10.1136/amiajnl-2011-000321. - DOI - PMC - PubMed
    1. Bashyam V, Hsu W, Watt E, Bui AA, Kangarloo H, Taira RK. Problem-centric organization and visualization of patient imaging and clinical data 1. Radiographics. 2009;29(2):331–343. doi: 10.1148/rg.292085098. - DOI - PMC - PubMed
    1. Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, De Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. J Biomed Inform. 2009;42(5):937–949. doi: 10.1016/j.jbi.2008.12.005. - DOI - PubMed
    1. Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH. A text processing pipeline to extract recommendations from radiology reports. J Biomed Inform. 2013;46(2):354–362. doi: 10.1016/j.jbi.2012.12.005. - DOI - PubMed
    1. Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH. AMIA Annual Symposium Proceedings: 2011: American medical informatics association; 2011. 1593. Automatic identification of critical follow-up recommendation sentences in radiology reports. - PMC - PubMed

Publication types

LinkOut - more resources