Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec;20(e2):e334-40.
doi: 10.1136/amiajnl-2013-001999. Epub 2013 Oct 22.

Automated extraction of clinical traits of multiple sclerosis in electronic medical records

Affiliations

Automated extraction of clinical traits of multiple sclerosis in electronic medical records

Mary F Davis et al. J Am Med Inform Assoc. 2013 Dec.

Abstract

Objectives: The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.

Materials and methods: We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers.

Results: We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%.

Discussion and conclusion: This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability.

Keywords: Multiple sclerosis; electronic health records.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic to represent how the algorithm to determine the origin of first neurological symptom works.
Figure 2
Figure 2
Distributions of timed 25-foot walk scores as found in the structured fields and extracted from the text of the clinical records.

References

    1. Ritchie MD, Denny JC, Crawford DC, et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet 2010;86:560–72 - PMC - PubMed
    1. Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010;26:1205–10 - PMC - PubMed
    1. Naito S, Namerow N, Mickey MR, et al. Multiple sclerosis: association with HL-A3. Tissue Antigens 1972;2:1–4 - PubMed
    1. Gregory SG, Schmidt S, Seth P, et al. Interleukin 7 receptor alpha chain (IL7R) shows allelic and functional association with multiple sclerosis. Nat Genet 2007;39:1083–91 - PubMed
    1. Sawcer S, Hellenthal G, Pirinen M, et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 2011;476:214–19 - PMC - PubMed

Publication types