Natural history of rare diseases using natural language processing of narrative unstructured electronic health records: The example of Dravet syndrome
- PMID: 38065926
- DOI: 10.1111/epi.17855
Natural history of rare diseases using natural language processing of narrative unstructured electronic health records: The example of Dravet syndrome
Abstract
Objective: The increasing implementation of electronic health records allows the use of advanced text-mining methods for establishing new patient phenotypes and stratification, and for revealing outcome correlations. In this study, we aimed to explore the electronic narrative clinical reports of a cohort of patients with Dravet syndrome (DS) longitudinally followed at our center, to identify the capacity of this methodology to retrace natural history of DS during the early years.
Methods: We used a document-based clinical data warehouse employing natural language processing to recognize the phenotype concepts in the narrative medical reports. We included patients with DS who have a medical report produced before the age of 2 years and a follow-up after the age of 3 years ("DS cohort," 56 individuals). We selected two control populations, a "general control cohort" (275 individuals) and a "neurological control cohort" (281 individuals), with similar characteristics in terms of gender, number of reports, and age at last report. To find concepts specifically associated with DS, we performed a phenome-wide association study using Cox regression, comparing the reports of the three cohorts. We then performed a qualitative analysis of the surviving concepts based on their median age at first appearance.
Results: A total of 76 concepts were prevalent in the reports of children with DS. Concepts appearing during the first 2 years were mostly related with the epilepsy features at the onset of DS (convulsive and prolonged seizures triggered by fever, often requiring in-hospital care). Subsequently, concepts related to new types of seizures and to drug resistance appeared. A series of non-seizure-related concepts emerged after the age of 2-3 years, referring to the nonseizure comorbidities classically associated with DS.
Significance: The extraction of clinical terms by narrative reports of children with DS allows outlining the known natural history of this rare disease in early childhood. This original model of "longitudinal phenotyping" could be applied to other rare and very rare conditions with poor natural history description.
Keywords: clinical informatics; developmental and epileptic encephalopathies; digitalization in medicine; longitudinal phenotyping; outcome in epilepsy.
© 2023 The Authors. Epilepsia published by Wiley Periodicals LLC on behalf of International League Against Epilepsy.
References
REFERENCES
-
- Olivera P, Danese S, Jay N, Natoli G, Peyrin-Biroulet L. Big data in IBD: a look into the future. Nat Rev Gastroenterol Hepatol. 2019;16(5):312-321.
-
- Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26(1):29-38.
-
- Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123-1131.
-
- Ahmed Z. Precision medicine with multi-omics strategies, deep phenotyping, and predictive analysis. Prog Mol Biol Transl Sci. 2022;190(1):101-125.
-
- Maceachern SJ, Forkert ND. Machine learning for precision medicine. Genome. 2021;64(4):416-425.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
