Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Aug;43(8):1071-1081.
doi: 10.1002/humu.24380. Epub 2022 Apr 27.

Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease

Affiliations
Review

Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease

Julius O B Jacobsen et al. Hum Mutat. 2022 Aug.

Abstract

Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project. Finally, we discuss the challenges that need to be overcome if we are going to improve detection rates and help the majority of patients that still remain without a molecular diagnosis after state-of-the-art genomic interpretation.

Keywords: diagnostics; phenotypes; rare disease; variant prioritization.

PubMed Disclaimer

Conflict of interest statement

Julius Jacobsen and Damian Smedley declare they previously acted as part‐time consultants for Congenica Ltd. The other authors declare no other potential conflicts of interest.

Figures

Figure 1
Figure 1
Recall of 4877 known molecular diagnoses from the 100,000 Genomes Project. Exomiser and LIRICAL were run using their standard settings and the percentage of molecular diagnoses detected as the top hit, in the top 3, 5, or 10 hits, or outside the top 10 are shown in the stacked bars. Performance for Exomiser was further broken down into whether the cases are singletons, duos (one parent sequenced), trios (both parents sequenced), or even larger family structures, for example, siblings sequenced as well.
Figure 2
Figure 2
Representation of Exomiser's phenotype‐based prioritization strategy. Exomiser takes as input a set of clinical phenotypes encoded as HPO terms as well as a patient VCF file from WES/WGS sequencing followed by variant calling. Optionally, these VCF files can be multisample, representing the sequences of other affected and unaffected family members, and further data on the pedigree is also supplied. Under default settings, Exomiser then removes any variants that are not protein‐coding, above minor allele frequency thresholds of 0.1% for dominant and 2% recessive modes of inheritance, and that do not segregate with the disease (except well‐supported pathogenic/likely pathogenic ClinVar variants retained regardless of location or frequency). Remaining variants for each possible mode of inheritance are then scored based on the rarity of the variant, predicted consequence, and the output of in silico pathogenicity prediction algorithms such as REVEL, MVP, SIFT, PolyPhen‐2, and MutatationTaster. In parallel, existing phenotypic data for each gene associated with these candidate variants are compared to the patient phenotypes, a phenotype score calculated, and combined with the variant score to produce a final Exomiser score that can be used to rank the candidate variants. This phenotypic evidence comes from known disease associations (OMIM, Orphanet) and model organism databases (MGI, IMPC, ZFIN) as well as nearby gene neighbors in the StringDB protein−protein association network.
Figure 3
Figure 3
Exomiser performance on 4877 known molecular diagnoses from the 100,000 Genomes Project. (a) Recall and precision at different score thresholds for the variant, phenotype, or combined Exomiser score. (b) Percentage of diagnoses detected as the top hit or in the top 3 or 5 hits when ranking by variant, phenotype to combined Exomiser score. (c) Recall and precision when ranking by Exomiser combined score as well as additional score thresholds.
Figure 4
Figure 4
Exomiser recall of 184 known SV diagnoses described in the literature. Previously described phenopackets representing known SV diagnoses curated from the literature were used as input to Exomiser along with short‐ or long‐read‐based SV VCF files. Exomiser was run using standard settings and the percentage of diagnoses detected as the top hit, in the top 3, 5, or 10 hits, or outside the top 10 are shown in the stacked bars.

Similar articles

Cited by

References

    1. Adzhubei, I. A. , Schmidt, S. , Peshkin, L. , Ramensky, V. E. , Gerasimova, A. , Bork, P. , Kondrashov, A. S. , & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7, 248–249. - PMC - PubMed
    1. Alemán, A. , Garcia‐Garcia, F. , Salavert, F. , Medina, I. , & Dopazo, J. (2014). A web‐based interactive framework to assist in the prioritization of disease candidate genes in whole‐exome sequencing studies. Nucleic Acids Research, 42, W88–W93. - PMC - PubMed
    1. Amberger, J. S. , Bocchini, C. A. , Scott, A. F. , & Hamosh, A. (2019). OMIM.org: Leveraging knowledge across phenotype−gene relationships. Nucleic Acids Research, 47, D1038–D1043. - PMC - PubMed
    1. Anderson, D. , Baynam, G. , Blackwell, J. M. , & Lassmann, T. (2019). Personalised analytics for rare disease diagnostics. Nature Communications, 10, 1–8. - PMC - PubMed
    1. Antanaviciute, A. , Watson, C. M. , Harrison, S. M. , Lascelles, C. , Crinnion, L. , Markham, A. F. , Bonthron, D. T. , & Carr, I. M. (2015). OVA: Integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization. Bioinformatics, 31, 3822–3829. - PMC - PubMed

Publication types