Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Nicolas Garcelon^{1

2}, Antoine Neuraz^{1

2}, Vincent Benoit¹, Rémi Salomon^{1

3}, Anita Burgun^{2

4}

Affiliations

¹ Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France.
² INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France.
³ Service de Néphrologie Pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France.
⁴ Hôpital Européen Georges Pompidou, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France.

PMID: 28339516
PMCID: PMC7651926
DOI: 10.1093/jamia/ocw144

Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Nicolas Garcelon et al. J Am Med Inform Assoc. 2017.

. 2017 May 1;24(3):607-613.

doi: 10.1093/jamia/ocw144.

Authors

Nicolas Garcelon^{1

2}, Antoine Neuraz^{1

2}, Vincent Benoit¹, Rémi Salomon^{1

3}, Anita Burgun^{2

4}

Affiliations

¹ Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France.
² INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France.
³ Service de Néphrologie Pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France.
⁴ Hôpital Européen Georges Pompidou, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France.

PMID: 28339516
PMCID: PMC7651926
DOI: 10.1093/jamia/ocw144

Abstract

Objective: The repurposing of electronic health records (EHRs) can improve clinical and genetic research for rare diseases. However, significant information in rare disease EHRs is embedded in the narrative reports, which contain many negated clinical signs and family medical history. This paper presents a method to detect family history and negation in narrative reports and evaluates its impact on selecting populations from a clinical data warehouse (CDW).

Materials and methods: We developed a pipeline to process 1.6 million reports from multiple sources. This pipeline is part of the load process of the Necker Hospital CDW.

Results: We identified patients with "Lupus and diarrhea," "Crohn's and diabetes," and "NPHP1" from the CDW. The overall precision, recall, specificity, and F-measure were 0.85, 0.98, 0.93, and 0.91, respectively.

Conclusion: The proposed method generates a highly accurate identification of cases from a CDW of rare disease EHRs.

Keywords: data warehouse; electronic health records; natural language processing; rare diseases; search engine.

PubMed Disclaimer

Figures

**Figure 1.**
Illustration of the objective of our method and its context.

**Figure 3.**
Both the number of patients retrieved and the accuracy (represented by precision, recall, specificity, and F-measure) are presented without filtering, with family history detection and with both detection (family history and negation).

See this image and copyright information in PMC

References

1. Murphy SN, Mendis ME, Berkowitz DA et al. Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu Symp Proc. 2006;2006:1040. - PMC - PubMed
1. Hebbring SJ, Rastegar-Mojarad M, Ye Z et al. Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics. 2015;31(12):1981–7. - PMC - PubMed
1. Cuggia M, Garcelon N, Campillo-Gimenez B et al. Roogle: an information retrieval engine for clinical data warehouse. Stud Health Technol Inform. 2011;169:584–8. - PubMed
1. Huan Mo, William K Thompson, Luke V Rasmussen et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc. 2015;22(6):1220–30. - PMC - PubMed
1. Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol 2012;8(12):e1002823. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Affiliations

Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical