Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 29;4(3):ooab084.
doi: 10.1093/jamiaopen/ooab084. eCollection 2021 Jul.

A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study

Affiliations

A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study

Syed-Amad Hussain et al. JAMIA Open. .

Abstract

Objectives: Patient-generated health data (PGHD) are important for tracking and monitoring out of clinic health events and supporting shared clinical decisions. Unstructured text as PGHD (eg, medical diary notes and transcriptions) may encapsulate rich information through narratives which can be critical to better understand a patient's condition. We propose a natural language processing (NLP) supported data synthesis pipeline for unstructured PGHD, focusing on children with special healthcare needs (CSHCN), and demonstrate it with a case study on cystic fibrosis (CF).

Materials and methods: The proposed unstructured data synthesis and information extraction pipeline extract a broad range of health information by combining rule-based approaches with pretrained deep-learning models. Particularly, we build upon the scispaCy biomedical model suite, leveraging its named entity recognition capabilities to identify and link clinically relevant entities to established ontologies such as Systematized Nomenclature of Medicine (SNOMED) and RXNORM. We then use scispaCy's syntax (grammar) parsing tools to retrieve phrases associated with the entities in medication, dose, therapies, symptoms, bowel movements, and nutrition ontological categories. The pipeline is illustrated and tested with simulated CF patient notes.

Results: The proposed hybrid deep-learning rule-based approach can operate over a variety of natural language note types and allow customization for a given patient or cohort. Viable information was successfully extracted from simulated CF notes. This hybrid pipeline is robust to misspellings and varied word representations and can be tailored to accommodate the needs of a specific patient, cohort, or clinician.

Discussion: The NLP pipeline can extract predefined or ontology-based entities from free-text PGHD, aiming to facilitate remote care and improve chronic disease management. Our implementation makes use of open source models, allowing for this solution to be easily replicated and integrated in different health systems. Outside of the clinic, the use of the NLP pipeline may increase the amount of clinical data recorded by families of CSHCN and ease the process to identify health events from the notes. Similarly, care coordinators, nurses and clinicians would be able to track adherence with medications, identify symptoms, and effectively intervene to improve clinical care. Furthermore, visualization tools can be applied to digest the structured data produced by the pipeline in support of the decision-making process for a patient, caregiver, or provider.

Conclusion: Our study demonstrated that an NLP pipeline can be used to create an automated analysis and reporting mechanism for unstructured PGHD. Further studies are suggested with real-world data to assess pipeline performance and further implications.

Keywords: artificial intelligence; chronic disease; cystic fibrosis; natural language processing; patient notes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Process flow of note processing and information extraction.
Figure 2.
Figure 2.
An example for drug dosage extraction using dependency trees. The NER model identifies the drug within the sentence, that is, Tylenol (green). Once identified, we move up the dependency tree until we either find an NUM or NOUN. If we find an NOUN (blue), we see if there is an NUM as a child to the NOUN. It is assumed that this NUM child (yellow) is the quantity and the NOUN (blue) is the unit for the drug’s dosage. If no NOUN or NUM occurs in the same clause of the sentence, or subsection of the dependency tree, then no dosage information is extracted. NER: named entity recognition.

Similar articles

Cited by

References

    1. Lau HS, Florax C, Porsius AJ, De Boer A.. The completeness of medication histories in hospital medical records of patients admitted to general internal medicine wards. Br J Clin Pharmacol 2000; 49 (6): 597–603. - PMC - PubMed
    1. Bell SK, Delbanco T, Elmore JG, et al.Frequency and types of patient-reported errors in electronic health record ambulatory care notes. JAMA Netw Open 2020; 3 (6): e205867. - PMC - PubMed
    1. Khoo EM, Lee WK, Sararaks S, et al.Medical errors in primary care clinics—a cross sectional study. BMC Fam Pract 2012; 13 (1): 127. - PMC - PubMed
    1. McPherson M, Arango P, Fox H, et al.A new definition of children with special health care needs. Pediatrics 1998; 102 (1 Pt 1): 137–40. - PubMed
    1. 2009. –C. The National Survey of Children with Special Health Care Needs. https://mchb.hrsa.gov/sites/default/files/mchb/Data/NSCH/nscshcn0910-cha... Accessed June 9, 2021.

LinkOut - more resources