Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;22(3):507-18.
doi: 10.1136/amiajnl-2014-003151. Epub 2014 Oct 21.

Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

Affiliations

Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

Christopher Ochs et al. J Am Med Inform Assoc. 2015 May.

Abstract

Objective: Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA.

Methods: An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA.

Results: We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample.

Discussion: The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject.

Conclusions: An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

Keywords: SNOMED CT; abstraction network; scalable quality assurance; standards quality assurance; subject-based terminology quality assurance; terminology quality assurance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) An excerpt of 29 concepts from the Clinical finding hierarchy. IS-A relationships are shown as upward arrows between concepts. Concepts with the exact same set of outgoing attribute relationships are grouped into dashed bubbles that are labeled with the set of relationships. For example, the concepts Bleeding and Inflammatory disorder all have one relationship, Associated morphology. (B) The area taxonomy for the concepts in (A). Areas are displayed as colored boxes, named by the common relationship(s). Areas are organized into color-coded levels according to their numbers of relationships. The 13 concepts with the Finding site relationship are now represented by the box named Finding site on level 1 (green) of the area taxonomy. Child-of links appear as bold arrows, for example, {Associated morphology, Finding site} is child-of both {Associated morphology} and {Finding site}. (C) The partial-area taxonomy for the concepts in (A). The five concepts in the {Associated morphology, Finding site} area are now refined into two partial-areas, Genitourinary tract hemorrhage (3) and Hemorrhage of abdominal cavity structure (4). These partial-areas are child-of both Bleeding and Finding by site. Child-of links are shown as arrows between partial-areas.
Figure 2
Figure 2
(A) An example of a subhierarchy of 17 concepts in {Associated morphology, Finding site} grouped in partial-areas, which are enclosed by dashed colored bubbles. (B) The roots of the disjoint partial-areas are in color. Area roots are given a single color. Overlapping roots are multicolored according to the multiple area roots they are descendants of. (C) The disjoint partial-area taxonomy for (A). Disjoint partial-areas are color coded according to the colors of their root concept in (B). The nine disjoint partial-areas summarize the 17 concepts.
Figure 3
Figure 3
Top five (out of six) levels of the Bleeding subject-based subtaxonomy. Each level is color coded according to the number of relationships. Levels have been organized into multiple rows due to space limitations. Partial-areas in each area are listed in decreasing order, from left to right, according to their size. Child-of links are not shown for readability. A total of 932 bleeding-related concepts are summarized by 199 partial-areas in 42 areas. Over half (56% = 522/932) of the concepts summarized by this subtaxonomy are in {Associated morphology, Finding site}. The first row of larger partial-areas in this area indicates the major types of bleeding-related findings in SNOMED CT, such as Hemorrhage of abdominal cavity structure (186 concepts), Gastrointestinal hemorrhage (117), and Genitourinary tract hemorrhage (88), demonstrating the summary effect provided by the subject-based subtaxonomy.
Figure 4
Figure 4
An excerpt of 23 disjoint partial-areas from the disjoint partial-area subtaxonomy derived for the concepts in {Associated morphology, Finding by site}. The disjoint partial-areas Mass of body structure and Injury of anatomical site, shown in a gray box, are not part of the Bleeding subject-based subtaxonomy, but many Bleeding concepts overlap with them. Partial-areas outside of the subtaxonomy, such as Mass of body structure, which overlap with partial-areas in the subtaxonomy, for example, Hemorrhage of abdominal cavity structure, are not part of the subtaxonomy and can be hidden, but are important for terminology quality assurance (TQA) to capture the complexity of the overlapping concepts. For example, the disjoint partial-area Pelvic hematoma (3) would not exist if such overlap was not considered.
Figure 5
Figure 5
The Cancer subject-based subtaxonomy, following the graphical convention of figure 3. Levels have been organized into multiple rows due to space limitations. Areas of the same levels are color coded according to their number of relationships. Child-of links are not shown for readability. The Cancer subject-based subtaxonomy summarizes 3531 concepts by 125 partial-areas in 19 areas. The 64 partial-areas that do not appear in the complete Clinical finding taxonomy are highlighted in yellow. The concepts inside of the yellow partial-areas are found in the Mass of body structure (7010 concepts) partial-area in the complete taxonomy.

References

    1. Stearns MQ, Price C, Spackman KA, et al. SNOMED clinical terms: overview of the development process and project status. Proc AMIA Symp. 2001:662–6. - PMC - PubMed
    1. Giannangelo K, Fenton SH. SNOMED CT survey: an assessment of implementation in EMR/EHR applications. Perspect Health Inf Manag. 2008;5:7. - PMC - PubMed
    1. van der Kooij J, Goossen WT, Goossen-Baremans AT, et al. Using SNOMED CT codes for coding information in electronic health records for stroke patients. Stud Health Technol Inform. 2006;124:815–23. - PubMed
    1. Elevitch FR. SNOMED CT: electronic health record enhances anesthesia patient safety. AANA J. 2005;73:361–6. - PubMed
    1. Dougherty M. Standard terminology helps advance EHR. J AHIMA. 2003;74:59–60. - PubMed