Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
- PMID: 30404767
- PMCID: PMC6249505
- DOI: 10.2196/10497
Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
Abstract
Background: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive.
Objective: Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data.
Methods: We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms.
Results: We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs.
Conclusions: Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.
Keywords: Autism Spectrum Disorder; DSM; complex entity extraction; decision tree; electronic health records; machine learning; natural language processing; parser.
©Gondy Leroy, Yang Gu, Sydney Pettygrove, Maureen K Galindo, Ananyaa Arora, Margaret Kurzius-Spencer. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 07.11.2018.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures






Similar articles
-
Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 4 Years - Early Autism and Developmental Disabilities Monitoring Network, Seven Sites, United States, 2010, 2012, and 2014.MMWR Surveill Summ. 2019 Apr 12;68(2):1-19. doi: 10.15585/mmwr.ss6802a1. MMWR Surveill Summ. 2019. PMID: 30973853 Free PMC article.
-
Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014.MMWR Surveill Summ. 2018 Apr 27;67(6):1-23. doi: 10.15585/mmwr.ss6706a1. MMWR Surveill Summ. 2018. PMID: 29701730 Free PMC article.
-
Early Identification of Autism Spectrum Disorder Among Children Aged 4 Years - Early Autism and Developmental Disabilities Monitoring Network, Six Sites, United States, 2016.MMWR Surveill Summ. 2020 Mar 27;69(3):1-11. doi: 10.15585/mmwr.ss6903a1. MMWR Surveill Summ. 2020. PMID: 32214075 Free PMC article.
-
Communication interventions for autism spectrum disorder in minimally verbal children.Cochrane Database Syst Rev. 2018 Nov 5;11(11):CD012324. doi: 10.1002/14651858.CD012324.pub2. Cochrane Database Syst Rev. 2018. PMID: 30395694 Free PMC article.
-
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1. Artif Intell Med. 2023. PMID: 38042599 Free PMC article.
Cited by
-
Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS.PLoS One. 2021 May 17;16(5):e0250448. doi: 10.1371/journal.pone.0250448. eCollection 2021. PLoS One. 2021. PMID: 33999927 Free PMC article.
-
Predicting neurodevelopmental disorders using machine learning models and electronic health records - status of the field.J Neurodev Disord. 2024 Nov 15;16(1):63. doi: 10.1186/s11689-024-09579-0. J Neurodev Disord. 2024. PMID: 39548397 Free PMC article. Review.
-
RAGing ahead in rheumatology: new language model architectures to tame artificial intelligence.Ther Adv Musculoskelet Dis. 2025 Apr 21;17:1759720X251331529. doi: 10.1177/1759720X251331529. eCollection 2025. Ther Adv Musculoskelet Dis. 2025. PMID: 40292012 Free PMC article. Review.
-
Development of a real-world database for asthma and COPD: The SingHealth-Duke-NUS-GSK COPD and Asthma Real-World Evidence (SDG-CARE) collaboration.BMC Med Inform Decis Mak. 2023 Jan 9;23(1):4. doi: 10.1186/s12911-022-02071-6. BMC Med Inform Decis Mak. 2023. PMID: 36624490 Free PMC article.
-
A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook.Yearb Med Inform. 2019 Aug;28(1):218-222. doi: 10.1055/s-0039-1677937. Epub 2019 Aug 16. Yearb Med Inform. 2019. PMID: 31419835 Free PMC article.
References
-
- Christensen DL, Baio J, Van Naarden Braun K, Bilder D, Charles J, Constantino JN, Daniels J, Durkin MS, Fitzgerald RT, Kurzius-Spencer M, Lee LC, Pettygrove S, Robinson C, Schulz E, Wells C, Wingate MS, Zahorodny W, Yeargin-Allsopp M. Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012. Surveillance Summaries. 2016 Apr;65(3):1–23. doi: 10.15585/mmwr.ss6503a1. - DOI - PMC - PubMed
-
- Abney S, Schapire RE, Singer Y. Boosting Applied to Tagging and PP Attachment. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora; Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora; June 21-22; College Park, MD, USA. 1999. https://aclanthology.coli.uni-saarland.de/events/emnlp-1999
-
- Principal Investigators Prevalence of Autism Spectrum Disorders --- Autism and Developmental Disabilities Monitoring Network. Surveillance Summaries. 2009;58(SS10):1-20. https://www.cdc.gov/mmwr/preview/mmwrhtml/ss5810a1.htm - PubMed
-
- Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators. Centers for Disease Control and Prevention (CDC) Prevalence of autism spectrum disorder among children aged 8 years - autism and developmental disabilities monitoring network, 11 sites, United States, 2010. MMWR Surveill Summ. 2014 Mar 28;63(2):1–21. https://www.cdc.gov/mmwr/preview/mmwrhtml/ss6302a1.htm - PubMed
-
- Tyler Carl, Schramm Sarah, Karafa Matthew, Tang Anne S, Jain Anil. Electronic Health Record Analysis of the Primary Care of Adults with Intellectual and Other Developmental Disabilities. J Policy Pract Intellect Disabil. 2010 Sep;7(3):204–210. doi: 10.1111/j.1741-1130.2010.00266.x. http://europepmc.org/abstract/MED/26113866 - DOI - PMC - PubMed