Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 13;16(1):309.
doi: 10.1186/s13023-021-01936-9.

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Affiliations

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Tommaso Lo Barco et al. Orphanet J Rare Dis. .

Abstract

Background: The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care.

Methods: Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions.

Results: We found significative higher representation of concepts related to seizures' phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders.

Conclusions: Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.

Keywords: Data mining; Dravet syndrome; Early diagnosis; Natural Language Processing; Rare Diseases.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of the selection procedures and constitution of the cohorts

References

    1. Gunter TD, Terry NP. The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. J Med Internet Res. 2005;7(1):e3. doi: 10.2196/jmir.7.1.e3. - DOI - PMC - PubMed
    1. Landi I, Glicksberg BS, Lee HC, Cherng S, Landi G, Danieletto M, et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit Med. 2020;3:1–11. doi: 10.1038/s41746-020-0301-z. - DOI - PMC - PubMed
    1. Olivera P, Danese S, Jay N, Natoli G, Peyrin-Biroulet L. Big data in IBD: a look into the future. Nat Rev Gastroenterol Hepatol. 2019;16(5):312–321. doi: 10.1038/s41575-019-0102-5. - DOI - PubMed
    1. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–1131. doi: 10.1377/hlthaff.2014.0041. - DOI - PubMed
    1. Shen F, Liu S, Wang Y, Wen A, Wang L, Liu H. Utilization of electronic medical records and biomedical literature to support the diagnosis of rare diseases using data fusion and collaborative filtering approaches. J Med Internet Res. 2018;20(10):e11301. - PMC - PubMed

Publication types

LinkOut - more resources