Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 1;24(3):607-613.
doi: 10.1093/jamia/ocw144.

Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Affiliations

Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Nicolas Garcelon et al. J Am Med Inform Assoc. .

Abstract

Objective: The repurposing of electronic health records (EHRs) can improve clinical and genetic research for rare diseases. However, significant information in rare disease EHRs is embedded in the narrative reports, which contain many negated clinical signs and family medical history. This paper presents a method to detect family history and negation in narrative reports and evaluates its impact on selecting populations from a clinical data warehouse (CDW).

Materials and methods: We developed a pipeline to process 1.6 million reports from multiple sources. This pipeline is part of the load process of the Necker Hospital CDW.

Results: We identified patients with "Lupus and diarrhea," "Crohn's and diabetes," and "NPHP1" from the CDW. The overall precision, recall, specificity, and F-measure were 0.85, 0.98, 0.93, and 0.91, respectively.

Conclusion: The proposed method generates a highly accurate identification of cases from a CDW of rare disease EHRs.

Keywords: data warehouse; electronic health records; natural language processing; rare diseases; search engine.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of the objective of our method and its context.
Figure 2.
Figure 2.
Overview of the pipeline.
Figure 3.
Figure 3.
Both the number of patients retrieved and the accuracy (represented by precision, recall, specificity, and F-measure) are presented without filtering, with family history detection and with both detection (family history and negation).

References

    1. Murphy SN, Mendis ME, Berkowitz DA et al. . Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu Symp Proc. 2006;2006:1040. - PMC - PubMed
    1. Hebbring SJ, Rastegar-Mojarad M, Ye Z et al. . Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics. 2015;31(12):1981–7. - PMC - PubMed
    1. Cuggia M, Garcelon N, Campillo-Gimenez B et al. . Roogle: an information retrieval engine for clinical data warehouse. Stud Health Technol Inform. 2011;169:584–8. - PubMed
    1. Huan Mo, William K Thompson, Luke V Rasmussen et al. . Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc. 2015;22(6):1220–30. - PMC - PubMed
    1. Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol 2012;8(12):e1002823. - PMC - PubMed