Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov;43(11):1642-1658.
doi: 10.1002/humu.24389. Epub 2022 May 22.

Computational analysis of neurodevelopmental phenotypes: Harmonization empowers clinical discovery

Affiliations
Review

Computational analysis of neurodevelopmental phenotypes: Harmonization empowers clinical discovery

David Lewis-Smith et al. Hum Mutat. 2022 Nov.

Abstract

Making a specific diagnosis in neurodevelopmental disorders is traditionally based on recognizing clinical features of a distinct syndrome, which guides testing of its possible genetic etiologies. Scalable frameworks for genomic diagnostics, however, have struggled to integrate meaningful measurements of clinical phenotypic features. While standardization has enabled generation and interpretation of genomic data for clinical diagnostics at unprecedented scale, making the equivalent breakthrough for clinical data has proven challenging. However, increasingly clinical features are being recorded using controlled dictionaries with machine readable formats such as the Human Phenotype Ontology (HPO), which greatly facilitates their use in the diagnostic space. Improving the tractability of large-scale clinical information will present new opportunities to inform genomic research and diagnostics from a clinical perspective. Here, we describe novel approaches for computational phenotyping to harmonize clinical features, improve data translation through revising domain-specific dictionaries, quantify phenotypic features, and determine clinical relatedness. We demonstrate how these concepts can be applied to longitudinal phenotypic information, which represents a critical element of developmental disorders and pediatric conditions. Finally, we expand our discussion to clinical data derived from electronic medical records, a largely untapped resource of deep clinical information with distinct strengths and weaknesses.

Keywords: Human Phenotype Ontology; big data; electronic health records; electronic medical records; epilepsy; genetics; genomics.

PubMed Disclaimer

Conflict of interest statement

Disclosures

I.H. serves on the Scientific Advisory Board of Biogen. The other authors declare no conflict of interest. R.H.T. has received honoraria and meeting support from Arvelle, Bial, Eisai, GW Pharma, LivaNova, Novartis, Sanofi, UCB Pharma, UNEEG and Zogenix. The other authors report no competing interests.

Figures

Figure 1.
Figure 1.
A visual representation of the complexity of Human Phenotype Ontology version 1.7.13 released 2021-10-10, focusing on seizure and related neurological phenotypes. This version contains 16,290 terms and 20,529 is_a relationships.
Figure 2.
Figure 2.
Interpretation using the is_a relationships of the HPO. (A) A simplified example of the HPO, comparing three individuals, each annotated with a single phenotypic term. (B) Individuals can be compared according to sets of HPO annotations. Here node color indicates the individual to whom they have been annotated, with nodes in green representing phenotypic descriptors applicable to both individuals. (C) Translation of raw clinical data typically results in precise phenotypic annotations which should be propagated following is_a relationships to infer the presence of less conceptually specific phenotypic concepts, otherwise the frequency of the latter will be underestimated.
Figure 3.
Figure 3.
The number of HPO annotations relating to seizure descriptors that were assigned to 791 individuals from the merger of three different research cohorts before (HPO release date 2017-12-12) and after (HPO release date 2020-12-12) expert revision. Reproduced and adapted under CC-BY from D. Lewis-Smith, Galer, et al. (2021).
Figure 4.
Figure 4.
The proportion of individuals with SCN2A-related disorders with particular phenotypes using data from Crawford et al. (2021). (A) The percentage of individuals coded as having (blue) or not having (red) a selection of 204 phenotypes coded as present and absent in this cohort after propagation, as well as the phenotypic gap: the percentage of individuals in whom we cannot annotate the presence or absence of the phenotype. The six HPO concepts with the smallest phenotyping gap and a representative selection of the remainder are shown. (B) The distribution of the phenotypic gap for all 204 phenotypes that were coded as present in at least one and absent in at least one member of the cohort, ranked by phenotyping gap with only a selection labelled for clarity.
Figure 5.
Figure 5.
The longitudinal interrogation of HPO annotations from the EMR of patients with genetic epilepsies. (A–F) Stacked bar charts demonstrating how the number of patients with the given phenotype recorded and without the phenotype recorded varies with age. For example, Febrile seizures [HP:0002373] is coded most frequently in children aged 1–7 years of age but only in a minority of individuals with clinical encounters over this age range. (G) How status epilepticus is particularly common in children under the age of 5 years with diagnostic SCN1A variants compared to those without. Reproduced and adapted under CC-BY from Ganesan et al. (2020).
Figure 6.
Figure 6.
The distribution of seizure frequencies and prescription of various antiseizure treatments with age as well as the odds ratios of achieving a reduction in seizure frequency or maintaining seizure freedom for a selection of medications and the ketogenic diet in patients at our center with (A) SCN8A-related and (B) STXBP1-related disorders. Seizures tend to become more common over the first year of life among people with SCN8A-related disorders, and those taking oxcarbazapine (a sodium channel blocker) are most likely to experience an improvement in seizure frequency. Regarding STXBP1, seizures tend to become less common over the first year of life, typically responding well to the ketogenic diet and clobazam. However, those requiring antiseizure medication into adulthood commonly take levetiracetam rather than alternative treatments. Panel b was reproduced and adapted under CC-BY from Xian et al. (2021).
Figure 7.
Figure 7.
An example of phenotypic similarity analysis using the simmax algorithm. (A) The annotations of two individuals are compared to each other. For each pairwise comparison of a phenotype from P1 and P2 the most informative common ancestor (MICA) is found. The MICA is the term that is an ancestor of the two terms being compared with the highest information content (IC). The similarity of the two terms being compared is defined as the IC of their MICA. Once this has been completed for all pairwise comparisons of phenotypes, the overall similarity of P1 and P2 is calculated as the sum of highest similarity of each of P1’s annotations and each of P2’s annotations. The denominator of 2 helps with comparison of the similarity score calculated using this algorithm to those obtained using other similarity algorithms. (B) The median similarity of individuals grouped according to a genetic feature such as de novo variants in AP2M1 is compared to the null distribution of median similarity scores generated by Monte Carlo simulation for groups of the same number of individuals (in this case n = 2) to yield an empirical p-value, estimating the probability of having observed a similarity this great due to chance in this cohort. Panel B created using data from Helbig et al. (2019).
Figure 8.
Figure 8.
HPO-based visualizations demonstrate the clinical features associated with de novo variants in SCN1A in published cohorts with developmental and epileptic encephalopathies. (A) The frequency of annotation of HPO terms in carriers of SCN1A de novo variants versus non-carriers regardless of age. (B) The same data presented to demonstrate the conceptual relationships between associated features within the structure of the HPO. p-values were calculated using Fisher’s exact test. Reproduced and adapted under CC-BY from D. Lewis-Smith, Galer, et al. (2021) using data from Galer et al. (2020).

References

    1. Abul-Husn NS, Cheng X, Li AH, Xin Y, Schurmann C, Stevis P, . . . Dewey FE (2018). A Protein-Truncating HSD17B13 Variant and Protection from Chronic Liver Disease. N Engl J Med, 378(12), 1096–1106. doi:10.1056/NEJMoa1712191 - DOI - PMC - PubMed
    1. Akawi N, McRae J, Ansari M, Balasubramanian M, Blyth M, Brady AF, . . . study, D. D. D. (2015). Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat Genet, 47(11), 1363–1369. doi:10.1038/ng.3410 - DOI - PMC - PubMed
    1. Amir RE, Van den Veyver IB, Wan M, Tran CQ, Francke U, & Zoghbi HY (1999). Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nat Genet, 23(2), 185–188. doi:10.1038/13810 - DOI - PubMed
    1. Andrews T, Meader S, Vulto-van Silfhout A, Taylor A, Steinberg J, Hehir-Kwa J, . . . Webber C (2015). Gene networks underlying convergent and pleiotropic phenotypes in a large and systematically-phenotyped cohort with heterogeneous developmental disorders. PLoS Genet, 11(3), e1005012. doi:10.1371/journal.pgen.1005012 - DOI - PMC - PubMed
    1. Bastarache L (2021). Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS. Annual Review of Biomedical Data Science, 4(1), 1–19. doi:10.1146/annurev-biodatasci-122320-112352 - DOI - PMC - PubMed

Publication types