Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2017 Aug 22;17(1):127.
doi: 10.1186/s12911-017-0519-0.

Empirical advances with text mining of electronic health records

Affiliations
Multicenter Study

Empirical advances with text mining of electronic health records

T Delespierre et al. BMC Med Inform Decis Mak. .

Abstract

Background: Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives.

Methods: Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs.

Results: By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives.

Conclusions: This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and helping to integrate the EHR system into the health staff work environment.

Keywords: Data mining; Hierarchical clustering; Information extraction; Multiple component analysis; Named entity recognition; Nursing homes; Principal component analysis; SQL query; Text mining; Word cloud.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The use of this database in the frame of epidemiological studies has been authorized by the French National Commission for Data protection and Liberties (CNIL). The Institut du Bien Vieillir filed a declaration of conformity to a baseline methodology which received in March 2017 an agreement number: 2.041.050, in accordance with the Act n°78–17 of 6 January 1978 on Data Processing, Data Files and Individual Liberties. All residents are informed at their NH entry about their EHR and their right to oppose its use. While the primary purpose of this medical research was to generate new knowledge, this goal didn’t take precedence over the rights and interests of the NH residents. All the new generated information was extracted from already existing data and was de-identified and anonymized when necessary to protect their health and rights. There were no images and no identifying details on individuals reported within this manuscript

Consent for publication

Not applicable.

Competing interests

The authors do not have any financial or non-financial competing interests to report. Funding to support TD’s work is reported above.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The experiment design with monitored (textual SQL and classification) and unsupervised (PCA, MCA, HC and textmining) techniques
Fig. 2
Fig. 2
The residents’ number of falls and ages frequencies of the physiotherapy sample
Fig. 3
Fig. 3
Bar plot of words appearing at least 30 times in the physiotherapy sample corpus (stage 2)
Fig. 4
Fig. 4
Word cloud with words appearing at least 10 times in the physiotherapy sample corpus (stage 2)
Fig. 5
Fig. 5
The PCA on medical histories, pathologies and number of falls defined as continuous variables
Fig. 6
Fig. 6
HC + MCA with regions, departments, NH’s names, gender, medical histories and pathologies as categorical variables
Fig. 7
Fig. 7
The HCPC plot function of the 6 clusters in 3D and 2D

References

    1. Maas ML, Delaney C. Nursing process outcome linkage research: issues, current status, and health policy implications. Med Care. 2004;42(2):II-40–II-48. - PubMed
    1. Ventres W, Kooienga S, Vuckovic N, et al. Physicians, Patients, and the Electronic Health Record: An Ethnographic Analysis Annals of Family Medecine, n°2 March/April. 2006;4:124–32. www.annfammed.org. - PMC - PubMed
    1. Mc Ginn CA, Grenier S, Duplantie J, et al. Comparison of user groups’ perspectives of barriers and facilitators to implementing electronic health records: a systematic review. BMC Med. 2011;9:46. doi: 10.1186/1741-7015-9-46. - DOI - PMC - PubMed
    1. Cebul RD, Love TE, Jain AK, et al. Electronic health records and quality of diabetes care. N Engl J Med. 2011;365:825–833. doi: 10.1056/NEJMsa1102519. - DOI - PubMed
    1. Genes N, Chandra D, Ellis S, et al. Validating emergency vital signs using a data quality engine for data warehouse. Open Med Inform J. 2013;7:34–39. doi: 10.2174/1874431101307010034. - DOI - PMC - PubMed

Publication types

LinkOut - more resources