Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun;46(3):410-24.
doi: 10.1016/j.jbi.2013.01.005. Epub 2013 Feb 9.

The Analytic Information Warehouse (AIW): a platform for analytics using electronic health record data

Affiliations

The Analytic Information Warehouse (AIW): a platform for analytics using electronic health record data

Andrew R Post et al. J Biomed Inform. 2013 Jun.

Abstract

Objective: To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations.

Materials and methods: We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transforming data represented in different physical schemas into a common data model, specifying derived variables in terms of the common model to enable their reuse, computing derived variables while enforcing invariants and ensuring correctness and consistency of data transformations, long-term curation of derived data, and export of derived data into standard analysis tools. It includes software that implements these features and a computing environment that enables secure high-performance access to and processing of large datasets extracted from EHRs.

Results: We have implemented and deployed the architecture in production locally. The software is available as open source. We have used it as part of hospital operations in a project to reduce rates of hospital readmission within 30days. The project examined the association of over 100 derived variables representing disease and co-morbidity phenotypes with readmissions in 5years of data from our institution's clinical data warehouse and the UHC Clinical Database (CDB). The CDB contains administrative data from over 200 hospitals that are in academic medical centers or affiliated with such centers.

Discussion and conclusion: A widely available platform for managing and detecting phenotypes in EHR data could accelerate the use of such data in quality improvement and comparative effectiveness studies.

PubMed Disclaimer

Figures

Figure A.1
Figure A.1
The percentage of hospital encounters in patients with exactly 1, 2, 3, etc. readmissions that were assigned to selected product lines (a) or MS-DRG codes (b) in the UHC CDB (2006–2011, all hospitals). These plots distinguish types of encounters that tend to occur in patients with few readmissions versus many readmissions.
Figure 1
Figure 1
AIW software architecture. The AIW software is a modular framework that extends the PROTEMPA temporal abstraction system for use in detecting clinical phenotypes in EHR data in quality improvement. The Job Executor controls data processing and is supported by the Data Source, Knowledge Source, and Algorithm Source services (blue boxes). Service provider implementations allow access to specific data or knowledge stores (red boxes). Arrows represent dependencies between components, not flow of information. A program (green box) calls the software through a defined API to retrieve data and detect phenotypes of interest.
Figure 2
Figure 2
A hypothetical Elev. (elevated) BP (blood pressure) in Hypertensive on Diuretic phenotype is specified as a temporal pattern computed from blood pressure (Elevated BP), diagnosis (Second Hypertension) and medication dispense (On Diuretic) intervals. It has the same endpoints as the contributing Elevated BP interval (gray dashed arrows). Gray dotted arrows denote temporal relationships defined between endpoints of intervals and are labeled with minimum and maximum time constraints. For example, (−6 mo, 0 mo) indicates that the first time point must occur between 0 and 6 months before the second time point.
Figure 3
Figure 3
An example of the hypothetical Elev. (elevated) BP (blood pressure) in Hypertensive on Diuretic abstraction. It is computed from blood pressure (Elevated BP abstraction), diagnosis (Hypertension abstraction) and medication dispense (On Diuretic abstraction) intervals. See Methods for details. Each interval has a label with its name and the abstraction mechanism that computed it in parentheses.
Figure 4
Figure 4
UML class diagram of the readmissions virtual data model. There may be multiple PatientDetails records per Patient to represent changes in address, name, marital status and organ donor status over a lifetime. While birthdate, gender, race and ethnicity should not change, they may be recorded incorrectly or differently over time with no way to validate which is correct, thus we represent them in PatientDetails too. PatientDetails are associated with Encounters to represent the name, address and demographic information at that time. Encounters represent the admit and discharge date (start and finish attributes), the database’s unique identifier for encounters (encounterId attribute), the age of the patient in years at the time of the encounter (age attribute), the location of the encounter down to unit level (healthcareEntity, organization, hospitalPavilion and unit attributes), the National Uniform Billing Committee UB- 04 discharge status code of the encounter (dischargeDisposition attribute), insurance type (insuranceType attribute) and APR-DRG (All Patient Refined Diagnosis Related Group) risk of mortality and severity values. The ICD9:Diagnoses class’ position attribute represents whether a diagnosis is primary or secondary. Laboratory test result values (LAB:LabTest) have attributes for reference range, units of measure and interpretation (high, normal, low, critical). The Nominal VDM data type is a string data type. VDM data types with a name ending in Value are value sets. The AbsoluteTime VDM data type represents a timestamp with its granularity in parentheses.
Figure 5
Figure 5
UML class diagram showing how the UHC Clinical Database (CDB) schema maps into the readmissions virtual data model (VDM). The subset of the CDB that was mapped is shown on the left in green, and the relevant subset of the readmissions VDM is shown on the right in blue. Purple arrows show how primary keys in the CDB map to VDM unique identifier attributes (UID). Red arrows show how other attributes in the CDB map to VDM attributes. The Nominal VDM data type is a string data type. VDM data types with a name ending in Value are value sets. The AbsoluteTime VDM data type represents a timestamp with its granularity in parentheses.
Figure 6
Figure 6
Flow chart showing processing steps during execution of an AIW job. A user specifies the rows and columns of the delimited file output as described in the text. After starting the job, the AIW determines from this specification what data to retrieve from the source database and what abstractions to compute. It generates and executes SQL queries, and transforms their result sets into the form of the virtual data model (VDM). It computes abstractions as specified in the temporal abstraction ontology (Abstraction definitions in figure), and it generates output. CDW=clinical data warehouse.
Figure 7
Figure 7
Screenshot of the temporal abstraction ontology in Protégé, showing the Chemotherapy 180 days before surgery temporal pattern abstraction. The AbstractedFrom slot shows that the temporal pattern is composed of the V58.1 ICD-9 code group and the SurgicalProcedure category of ICD-9 codes (contains all surgical procedure codes). The WithRelations slot contains an instance of the Relation class specifying the temporal constraint between the procedure and chemotherapy encounter codes. The TemporalOffset slot contains an instance of the TemporalOffsets class that, together with the MaxGap, Concatenable and Solid slots, specify that intervals created of this abstraction should have the same temporal extent as the chemotherapy encounter code from which they are derived. The InDataSource slot is unchecked to indicate that this data element should not be searched for in the source database (because it is computed).

Similar articles

Cited by

References

    1. Kocher R, Emanuel EJ, DeParle NA. The Affordable Care Act and the future of clinical medicine: the opportunities and challenges. Ann Intern Med. 2010;153:536–539. - PubMed
    1. Blumenthal D. Implementation of the federal health information technology initiative. N Engl J Med. 2011;365:2426–2431. - PubMed
    1. Blumenthal D. Wiring the health system--origins and provisions of a new federal program. N Engl J Med. 2011;365:2323–2329. - PubMed
    1. National Quality Measures Clearinghouse | Tutorial on Quality Measures. Agency for Healthcare Research and Quality. [(accessed 2012 Apr 3)]; http://www.qualitymeasures.ahrq.gov/tutorial/index.aspx. - PubMed
    1. Shahian DM, Wolf RE, Iezzoni LI, Kirle L, Normand SL. Variability in the measurement of hospital-wide mortality rates. N Engl J Med. 2010;363:2530–2539. - PubMed

LinkOut - more resources