Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 20;7(4):e13917.
doi: 10.2196/13917.

Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study

Affiliations

Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study

Romain Lelong et al. JMIR Med Inform. .

Abstract

Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information.

Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context.

Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information.

Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed.

Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

Keywords: clinical trial; data warehousing; patient selection; search engine; semantics.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Functional coverage of the semantic health data warehouse in terms of data according to each domain. SHDW: semantic health data warehouse; CPOE: computerized physician order entry; DCC: French cancer communication file; PMR: personal medical record; DRG: diagnosis-related group; EHR: electronic health record.
Figure 2
Figure 2
Functional architecture of the semantic health data warehouse that provides semantic information retrieval (IR) functionalities form clinical data. The 2 data repositories, knowledge data and health data, respectively, maintain the reference knowledge organization systems and the health data pertaining to the semantic health data warehouse. These data are accessed through a not only structured query language (NoSQL) layer by the 3 distinct components: the cross-terminological health terminology and ontology portal (HeTOP), the semantic annotator extracting concepts from multiple terminologies (ECMT), and the semantic search engine (SSE), each operating on a different range of data. CN: clinical narrative; T&O: terminology and ontology.
Figure 3
Figure 3
Partial Conceptual Model of the semantic health data warehouse represented as a directed and attributed graph. Entities corresponding to elements from terminologies and ontologies are represented with dashed outlines. DRG: diagnosis-related group; PIN: personal identification number.
Figure 4
Figure 4
The interface of the semantic access to health information, ASIS, Web application, and its 4 steps: (1) definition of constraints, (2) composition of a Boolean query from atomic constraint defined in step 1, (3) selection of the desired output entity according to its clinical coherent level, and (4) visualization of the results.
Figure 5
Figure 5
The central gray band gives the percentage of criteria of each support level excluding not applicable criteria. The upper bars show, for each support level, the percentages of involvement of each source of information in the search of criteria. The lower bars show the distribution (in percentage) of the different obstacle categories identified as lowering the effectiveness of the search of criteria. CN: clinical narrative; DRG: diagnosis-related group.

References

    1. O'Connor PJ, Sperl-Hillen JM, Rush WA, Johnson PE, Amundson GH, Asche SE, Ekstrom HL, Gilmer TP. Impact of electronic health record clinical decision support on diabetes care: a randomized trial. Ann Fam Med. 2011;9(1):12–21. doi: 10.1370/afm.1196. http://www.annfammed.org/cgi/pmidlookup?view=long&pmid=21242556 9/1/12 - DOI - PMC - PubMed
    1. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21(2):221–30. doi: 10.1136/amiajnl-2013-001935. http://europepmc.org/abstract/MED/24201027 amiajnl-2013-001935 - DOI - PMC - PubMed
    1. Krasowski M, Schriever A, Mathur G, Blau J, Stauffer S, Ford B. Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research. J Pathol Inform. 2015;6:45. doi: 10.4103/2153-3539.161615. http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2015;vol... JPI-6-45 - DOI - PMC - PubMed
    1. VanLangen K, Wellman G. Trends in electronic health record usage among US colleges of pharmacy. Curr Pharm Teach Learn. 2018 May;10(5):566–70. doi: 10.1016/j.cptl.2018.01.010.S1877-1297(17)30275-7 - DOI - PubMed
    1. Cottle M, Hoover W, Kanwal S, Kohn M, Strome T, Treister NW. Transforming Health Care Through Big Data. 2013. [2019-10-02]. Transforming Health Care Through Big Data http://c4fd63cb482ce6861463-bc6183f1c18e748a49b87a25911a0555.r93.cf2.rac... .

LinkOut - more resources