Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec;20(e2):e341-8.
doi: 10.1136/amiajnl-2013-001939. Epub 2013 Nov 4.

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

Affiliations

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

Jyotishman Pathak et al. J Am Med Inform Assoc. 2013 Dec.

Abstract

Research objective: To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction.

Materials and methods: Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine.

Results: Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria.

Conclusions: End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

Keywords: Electronic health record; Meaningful Use; Natural Language Processing; Normalization; Phenotype Extraction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SecondaryUsePatient Clinical Element Model.
Figure 2
Figure 2
SHARPn clinical data normalization pipeline. (1) Data to be normalized are read from the file system. These data can also be transmitted on NwHIN via TCP/IP from an external entity. (2) Mirth Connect invokes the normalization pipeline using one of its predefined channels and passes the data (eg, HL7, CCD, tabulardata) to be normalized. (3) The normalization pipeline goes through initialization of the components (including loading resources from the file system or other predefined resource such as the Common Terminology Services 2 (CTS2) and then performs syntactic parsing and semantic normalization to generate normalized data in the form of a Clinical Element Model (CEM). (4) Normalized data are handed back to Mirth Connect. (5) Mirth Connect uses one of the predefined channels to serialize the normalized CEM data to CouchDB or MySQL based on the configuration. CTAKES, clinical Text Analysis and Knowledge Extraction System; DB, database; NLP, natural language processing; UIMA, Unstructured Information Management Architecture.
Figure 3
Figure 3
Architecture for Quality Data Model (QDM) to Drools translator system. AE, annotation engine; CAS, common analysis system; CEM, Clinical Element Model; DB, database; UIMA, Unstructured Information Management Architecture; XSLT, extensible stylesheet transformation language.
Figure 4
Figure 4
Conceptual diagram of validation processes. (1) Use-case data are translated to HL7 and submitted to the MIRTH interface engine, and then (2) processed and stored as normalized data objects. (3) A Java application pulls data from the source database and the Clinical Element Model (CEM) database, compares them, then (4) prints inconsistencies for manual review.
Figure 5
Figure 5
Denominator, numerator and exclusion criteria for NQF 0064: Diabetes: Low Density Lipoprotein (LDL) Management and Control.

References

    1. Office_of_the_National_Coordinator. 2010. Strategic Health IT Advanced Research Projects: SHARP. http://healthit.hhs.gov/sharp (accessed 7 Sep 2010)
    1. SHARPs: Strategic Health IT Project Advanced Research Project on Security. http://sharps.org (accessed 18 Dec 2012)
    1. SHARPc: Strategic Health IT Advanced Research Project on Cognitive Informatics and Decision Making. http://sharpc.org (accessed 18 Dec 2012)
    1. Mandl KD, Mandel JC, Murphy SN, et al. The SMART Platform: early experience enabling substitutable applications for electronic health records. J Am Med Inform Assoc 2012;19:597–603 - PMC - PubMed
    1. Rea S, Pathak J, Savova G, et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform 2012;45:763–71 - PMC - PubMed

Publication types