Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 Nov-Dec;5(6):511-27.
doi: 10.1136/jamia.1998.0050511.

Data extraction and ad hoc query of an entity-attribute-value database

Affiliations

Data extraction and ad hoc query of an entity-attribute-value database

P M Nadkarni et al. J Am Med Inform Assoc. 1998 Nov-Dec.

Abstract

Entity--attribute--value (EAV) tables form the major component of several mainstream electronic patient record systems (EPRSs). Such systems have been optimized for real-time retrieval of individual patient data. Data warehousing, on the other hand, involves cross-patient data retrieval based on values of patient attributes, with a focus on ad hoc query. Attribute-centric query is inherently more difficult when data are stored in EAV form than when they are stored conventionally. The authors illustrate their approach to the attribute-centric query problem with ACT/DB, a database for managing clinical trials data. This approach is based on metadata supporting a query front end that essentially hides the EAV/non-EAV nature of individual attributes from the user. The authors' work does not close the query problem, and they identify several complex subproblems that are still to be solved.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Specifying selection criteria in the Ad Hoc Query form. This and the next figure deal with patients in a prospective breast cancer study. The selection criteria are patients who had menarche at age 16 years or later, no pregnancies, irregular menses, and a history of either estrogen replacement therapy or oral contraceptive use. This is specified by the compound selection criterion “1 & 2 & 5 & (3 | 4)” in the lower middle of the figure, where the numbers refer to individual selection criteria. The rightmost column in the figure, “Enumerated/Ordinal Values,” is used to display (and manipulate, if necessary) the internal integers corresponding to symbolic phrases for such data types. Thus, the number 2 corresponds to the symbolic phrase “irregular” for the attribute “menses.”
Figure 2
Figure 2
Specifying output attributes for patients identified by the criteria of ▶. These are the medians of the histologic tumor grade, nuclear grade, and pathology grade (all of which are ordinal data types), and the average and standard deviation of both the estrogen receptor and progesterone receptor histologic scores. Tumors are studied at multiple times during the course of follow-up and may be collected from multiple sites (e.g., primary, recurrent primary, or metastatic). Since all the output attributes are aggregates (one value per patient), they will be placed in a single output table (given the name “Breast_Ca_Path”). The field names in this output table have been specified by typing in and are slightly more mnemonic than the default field names, which are created by concatenating the aggregate name and the attribute name.
Figure 3
Figure 3
Template for computing the median values of an attribute (for each patient). The SQL code inserts sorted Patient_ID and attribute values into a table, TempMed1, that automatically creates sequentially numbered records (with the sequence number the RecID field). The maximum and minimum RecIDs for each patient are then determined, and the “middle” RecIDs computed. For patients with an odd number of values, there is a single “middle” value, while for patients with an even number of values, there are two “middle” values. The values corresponding to these RecIDs are then extracted into another table, TempMed2, and the median for each patient is determined by taking the average of the two values (for even numbers) or using the single value (for odd numbers).
Figure 4
Figure 4
Template for identifying patients based on the concept of “normal” laboratory values. The template code joins three tables or views—the view containing the laboratory values, the patient (demographics) table, and the Lab _test_ranges table that contains normal values for each test, by laboratory ID and the age and sex of the subject. The join would return multiple rows per patient, so the output is restricted by use of the age and sex for each patient. The num-ber of rows from the patient demographics table may be further limited if the user has specified that only patients of a particular age range or sex are to be considered.

References

    1. The 3M Clinical Data Repository. Murray, Utah: 3M Health Information Systems, 1998.
    1. Huff SM, Haug DJ, Stevens LE, Dupont CC, Pryor TA. HELP the next generation: a new client-server architecture. Proc 18th Symp Comput Appl Med Care. 1994: 271-5. - PMC - PubMed
    1. Huff SM, Berthelsen CL, Pryor TA, Dudley AS. Evaluation of a SQL model of the HELP patient database. Proc 15th Symp Comput Appl Med Care. 1991: 386-90. - PMC - PubMed
    1. Friedman C, Hripcsak G, Johnson S, Cimino J, Clayton P. A generalized relational schema for an integrated clinical patient database. Proc 14th Symp Comput Appl Med Care. 1990: 335-9.
    1. Johnson S, Cimino J, Friedman C, Hripcsak G, Clayton P. Using metadata to integrate medical knowledge in a clinical information system. Proc 14th Symp Comput Appl Med Care. 1990: 340-4.

Publication types

MeSH terms