Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 23;14(1):32.
doi: 10.1186/s11689-022-09442-0.

Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records

Affiliations

Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records

Mengge Zhao et al. J Neurodev Disord. .

Abstract

Background: Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives.

Methods: To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method.

Results: Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders.

Conclusion: Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.

Keywords: Autism; Autism spectrum disorder; Electronic health record; Natural language processing; Phenotype ontology; Terminology set.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of ASD phenotype ontology development
Fig. 2
Fig. 2
Gender and age distribution in ASD patient cohort. Some patients were diagnosed at very early age, which may represent an artifact of retrospective assignment of ICD codes in EHRs
Fig. 3
Fig. 3
Comparison of t-SNE clustering analysis for top 2000 ASD patients and 2000 psychiatric (non-ASD) patients using our terminology set (a) and using Lingren’s terminology set (b). Since not all the patients contain the ASD vocabulary developed by Lingren et al., we only analyzed patients containing these terms. Results showed that our terminology set separates ASD patients from general psychiatric (non-ASD) patients much better than Lingren’s list. From the t-SNE plot, we can see ASD patients can be further divided into 4 subgroups; however, one group of ASD patients (cluster 4) is mixed with non-ASD psychiatric patients
Fig. 4
Fig. 4
Mapping subgroup of ASD patients to DSM-5 guideline. a The percentage of subgroups of ASD patients in each cluster that maps to DSM-5 individual criteria. b As an illustrative example, we quantified individual patient’s ASD characteristics to DSM-5 guideline for patients in cluster 1 and cluster 4
Fig. 5
Fig. 5
Five levels of ASD phenotype ontology developed in our study. A Example of ASD phenotype ontology. B Examples of our ASD phenotype ontology displayed in the Protégé software for ontology analysis

References

    1. Frith U, Happe F. Autism spectrum disorder. Curr Biol. 2005;15(19):R786–R790. doi: 10.1016/j.cub.2005.09.033. - DOI - PubMed
    1. Ming X, Brimacombe M, Malek JH, Jani N, Wagner GC. Autism spectrum disorders and identified toxic land fills: co-occurrence across states. Environ Health Insights. 2008;2:55–59. doi: 10.4137/EHI.S830. - DOI - PMC - PubMed
    1. McPartland JC, Reichow B, Volkmar FR. Sensitivity and specificity of proposed DSM-5 diagnostic criteria for autism spectrum disorder. J Am Acad Child Adolesc Psychiatry. 2012;51(4):368–383. doi: 10.1016/j.jaac.2012.01.007. - DOI - PMC - PubMed
    1. Volkmar FR, Reichow B. Autism in DSM-5: progress and challenges. Mol Autism. 2013;4(1):13. doi: 10.1186/2040-2392-4-13. - DOI - PMC - PubMed
    1. Daniels AM, Mandell DS. Explaining differences in age at autism spectrum disorder diagnosis: a critical review. Autism. 2014;18(5):583–597. doi: 10.1177/1362361313480277. - DOI - PMC - PubMed

Publication types