An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

Isabelle B Cooperstein¹, Shruti Marwaha^{2

3}, Alistair Ward^{1

4}, Shilpa N Kobren⁵, Jennefer N Carter²; Undiagnosed Diseases Network; Matthew T Wheeler^{2

3}, Gabor T Marth⁶

Collaborators, Affiliations

PMID: 41121346
PMCID: PMC12539062
DOI: 10.1186/s13073-025-01546-1

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

Isabelle B Cooperstein et al. Genome Med. 2025.

. 2025 Oct 21;17(1):127.

doi: 10.1186/s13073-025-01546-1.

PMID: 41121346
PMCID: PMC12539062
DOI: 10.1186/s13073-025-01546-1

Abstract

Background: Exome sequencing (ES) and genome sequencing (GS) are increasingly used as standard genetic tests to identify diagnostic variants in rare disease cases. However, prioritizing these variants to reduce the time and burden of manual interpretation by clinical teams remains a significant challenge. The Exomiser/Genomiser software suite is the most widely adopted open-source software for prioritizing coding and noncoding variants. Despite its ubiquitous use, limited data-driven guidelines currently exist to optimize its performance for diagnostic variant prioritization. Based on detailed analyses of Undiagnosed Diseases Network (UDN) probands, this study presents optimized parameters and practical recommendations for deploying the Exomiser and Genomiser tools. We also highlight scenarios where diagnostic variants may be missed and propose alternative workflows to improve diagnostic success in such complex cases.

Methods: We analyzed 386 diagnosed probands from the UDN, including cases with coding and noncoding diagnostic variants. We systematically evaluated how tool performance was affected by key parameters, including gene:phenotype association data, variant pathogenicity predictors, phenotype term quality and quantity, and the inclusion and accuracy of family variant data.

Results: Parameter optimization significantly improved Exomiser's performance over default parameters. For GS data, the percentage of coding diagnostic variants ranked within the top 10 candidates increased from 49.7% to 85.5%, and for ES, from 67.3% to 88.2%. For noncoding variants prioritized with Genomiser, the top 10 rankings improved from 15.0% to 40.0%. We also explored refinement strategies for Exomiser outputs, including using p-value thresholds and flagging genes that are frequently ranked in the top 30 candidates but rarely associated with diagnoses.

Conclusion: This study provides an evidence-based framework for variant prioritization in ES and GS data using Exomiser and Genomiser. These recommendations have been implemented in the Mosaic platform to support the ongoing analysis of undiagnosed UDN participants and provide efficient, scalable reanalysis to improve diagnostic yield. Our work also highlights the importance of tracking solved cases and diagnostic variants that can be used to benchmark bioinformatics tools. Exomiser and Genomiser are available at https://github.com/exomiser/Exomiser/ .

Keywords: Diagnosis; Exome sequencing; Exomiser; Genome sequencing; Genomiser; HPO; Parameter optimization; Phenotype; Rare disease; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: All work included in this study was performed in accordance with all ethical guidelines outlined in the NIH IRB no. 15HG0130 and the UDN Manual of Operations. All de-identified patient data included in this study was provided with informed consent by all participants to be used freely for research purposes across the network. The study proposal and this manuscript were approved by the UDN Publications and Research Committee. All research has been conducted in accordance with the Declaration of Helsinki. Consent for publication: Not applicable. Competing interests: A.W. and G.T.M. are co-founders and CEO and CSO, respectively, of Frameshift Labs, the developer of the Mosaic platform. The remaining authors declare no competing interests.

Figures

**Fig. 1**
Evaluating VCF filtering criteria on 474 variants in combined ES and GS cohorts. A Minimum genotype quality (GQ) versus percent of diagnostic variants removed due to filtering criteria under varying required variant allele frequency (VAF) ranges for heterozygous variants represented by colored lines. Light blue line (15%–85%) overlaps dark blue (10%–90%). B Minimum GQ versus mean rank of diagnostic variants in Exomiser or Genomiser outputs under default parameters with varying required VAF ranges for heterozygous variants represented by colored lines

**Fig. 2**
Evaluation of phenotype prioritization algorithms in the GS Exomiser cohort. Cumulative percentage of diagnostic variants (n = 296) ranked at or above each rank threshold (x-axis). Each curve corresponds to a specific phenotype prioritization algorithm parameter setting represented by color. All runs use filtered VCF input and REVEL + MVP as the variant pathogenicity sources (default). Default phenotype prioritization algorithms (PHIVE, PhenIX, hiPHIVE) are compared, as well as variations of hiPHIVE using specific combinations of model organism gene-phenotype databases. hiPHIVE default (dark blue) and hiPHIVE human, PPI, and mouse (pink) curves overlap. The x-axis is limited to rank thresholds 1–30, consistent with our benchmarking strategy that defines successful prioritization as rank ≤ 30 (“Methods”)

**Fig. 3**
Stepwise optimization process for Exomiser and Genomiser across three UDN cohorts. A GS Exomiser cohort (n = 296 variants). B ES Exomiser cohort (n = 153 variants). C GS Genomiser cohort (n = 60 variants). Red lines: Exomiser or Genomiser performance under default settings (hiPHIVE all models; REVEL + MVP (+ ReMM for Genomiser)) using raw, unfiltered VCFs. All other runs use the filtered VCFs to remove potential false positive variants (“Methods”). Blue lines, Exomiser or Genomiser performance under default settings (hiPHIVE all models; REVEL + MVP, (+ ReMM for Genomiser)) using filtered VCFs. Orange lines, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP (+ ReMM for Genomiser) pathogenicity prediction sources. Green lines, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP + AlphaMissense + SpliceAI (+ ReMM for Genomiser) pathogenicity prediction sources. These are our optimized parameters

**Fig. 4**
Evaluation of variant pathogenicity prediction score sources in the GS Exomiser cohort. A Cumulative percentage of diagnostic variants (n = 296) ranked at or above each rank threshold (x-axis) under different combinations of variant pathogenicity prediction sources. Each colored curve corresponds to a specific combination of sources, as indicated in the boxed legend. B, C Breakdown of maximum pathogenicity score sources for prioritized *diagnostic* variants (B) or *nondiagnostic* variants (C) under each source combination. X-axis indicates the combination of sources used, and the y-axis represents the number of prioritized variants. Bar color reflects which source provided the maximum pathogenicity score, as indicated in the top legend. Dashed lines mark the total number of prioritized variants under each source combination. Variants not covered by any pathogenicity sources in that combination are assigned a class-based score (Additional file 1: Table S3) and, therefore, have no maximum score source, which is represented by the uncolored whitespace in each bar. D, E Distribution of maximum pathogenicity scores for all prioritized *diagnostic* variants (D) or *nondiagnostic* variants (E). X-axis represents the pathogenicity score (ranging from 0 to 1), and color indicates the combination of scoring sources used in each run, as indicated in the boxed legend. These distributions highlight differences in score separation between diagnostic and nondiagnostic variants. All data represent Exomiser runs on filtered VCFs (“Methods”) using hiPHIVE human-only gene:phenotype associations

**Fig. 5**
Frequently prioritized genes in the GS Exomiser cohort. Eighty-six genes that ranked in the top 30 candidates with p ≤ 0.3 for at least 5% of probands in the GS Exomiser cohort. List of genes can be found in Additional file 5. Color represents the binned average rank at which the gene was prioritized across all probands in the cohort. Bold font indicates OMIM genes with a confirmed causal relationship to a disease (OMIM 3). Shape reflects if the gene is diagnostic in the GS Exomiser cohort (square), not diagnostic in the GS Exomiser cohort but diagnostic for at least one proband in the UDN consortium as a whole (diamond), or not diagnostic in the UDN consortium (circle)

**Fig. 6**
Generalizability of optimized Exomiser parameters in newly diagnosed UDN probands. A Stepwise optimization process for Exomiser on a small cohort of newly diagnosed UDN probands encompassing 23 diagnostic variants. B Change in rank for each diagnostic variant (n = 23) in the newly diagnosed UDN proband cohort using optimized parameters (green) compared to default parameters (blue). *Denotes compound heterozygous diagnosis (two variants in labeled genes). Red, Exomiser under default settings (hiPHIVE all models; REVEL + MVP using raw, *unfiltered* VCFs. All other runs use the filtered VCFs to remove potential false positive variants (“Methods”). Blue, Exomiser performance under default settings (hiPHIVE all models; REVEL + MVP) using filtered VCFs. Orange, Exomiser performance using hiPHIVE human-only associations and REVEL + MVP pathogenicity prediction sources. Green, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP + AlphaMissense + SpliceAI (+ ReMM for Genomiser) pathogenicity prediction sources. These are our optimized parameters

**Fig. 7**
Parameter optimization shifts diagnostic variants into the top 10 candidates in the GS Exomiser cohort. Seventy (23.6%) variants in the GS Exomiser cohort are shifted into the top 10 candidates using optimized parameters (green) in comparison to default parameters (blue). Optimized parameters refer to running Exomiser on the filtered family VCF, hiPHIVE human-only gene-phenotype associations, and REVEL, MVP, AlphaMissense, and SpliceAI variant pathogenicity score sources. Default parameters refer to running Exomiser on the filtered family VCF using hiPHIVE human, mouse, zebrafish, PPI gene-phenotype associations, and REVEL and MVP variant pathogenicity score sources. *Denotes compound heterozygous diagnosis (two variants in labeled genes)

**Fig. 8**
Recommended workflow for using Exomiser and Genomiser in rare disease diagnostics. Numbers indicate the count (percentage) of diagnostic variants ranked (green circles) or not ranked (red circles) *within the top 30 candidates* by Exomiser or Genomiser after applying each preceding step in the flowchart. Percentages are calculated from the total in the preceding step, beginning at n = 380 diagnostic variants (296 from the GS Exomiser cohort, 24 with inconsistent pedigrees, and 60 from the GS Genomiser cohort). Begin analysis with a family VCF (when available), filtered to remove potentially false-positive variants (“Methods”). Use the REVEL, MVP, AlphaMissense, SpliceAI variant pathogenicity sources, and human-only hiPHIVE gene-phenotype associations. Run Exomiser using all available family variant data, pedigree information, inheritance filters, and the ClinVar whitelist enabled. Manually review the top 30 *contributing* variants, with frequently prioritized genes flagged. If no compelling candidates are identified, verify pedigree accuracy (considering that some family members may be misphenotyped) and consider running Exomiser on the proband-only variant data with inheritance filters enabled. If no strong candidates are found in GS data, run Genomiser to assess noncoding variants and compound heterozygous candidates with one noncoding variant and one coding variant

See this image and copyright information in PMC

References

1. Haendel M, Vasilevsky N, Unni D, et al. How many rare diseases are there? Nat Rev Drug Discov. 2020;19:77–8. - DOI - PMC - PubMed
1. Ferreira CR. The burden of rare diseases. Am J Med Genet A. 2019;179:885–92. - DOI - PubMed
1. Chung CCY, Hue SPY, Ng NYT, Doong PHL, Hong Kong Genome Project, Chu ATW, et al. Meta-analysis of the diagnostic and clinical utility of exome and genome sequencing in pediatric and adult patients with rare diseases across diverse populations. Genet Med. 2023;25:100896. - DOI - PubMed
1. Wright CF, Campbell P, Eberhardt RY, et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N Engl J Med. 2023;388:1559–71. - DOI - PMC - PubMed
1. 100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, et al (2021) 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N Engl J Med 385:1868–1880 - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

RO1HG012286/NHGRI AGMR

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical