Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 10;23(2):bbac019.
doi: 10.1093/bib/bbac019.

Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases

Affiliations

Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases

Xiao Yuan et al. Brief Bioinform. .

Abstract

It's challenging work to identify disease-causing genes from the next-generation sequencing (NGS) data of patients with Mendelian disorders. To improve this situation, researchers have developed many phenotype-driven gene prioritization methods using a patient's genotype and phenotype information, or phenotype information only as input to rank the candidate's pathogenic genes. Evaluations of these ranking methods provide practitioners with convenience for choosing an appropriate tool for their workflows, but retrospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate. In this research, the performance of ten recognized causal-gene prioritization methods was benchmarked using 305 cases from the Deciphering Developmental Disorders (DDD) project and 209 in-house cases via a relatively unbiased methodology. The evaluation results show that methods using Human Phenotype Ontology (HPO) terms and Variant Call Format (VCF) files as input achieved better overall performance than those using phenotypic data alone. Besides, LIRICAL and AMELIE, two of the best methods in our benchmark experiments, complement each other in cases with the causal genes ranked highly, suggesting a possible integrative approach to further enhance the diagnostic efficiency. Our benchmarking provides valuable reference information to the computer-assisted rapid diagnosis in Mendelian diseases and sheds some light on the potential direction of future improvement on disease-causing gene prioritization methods.

Keywords: HPO; Mendelian diseases; benchmarking; gene prioritization.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of study workflow. Flowchart of data collection and method implementation in this work. DDD patient cohort includes 305 cases with developmental disorders (represented as light blue) while the in-house KMCGD patient cohort involves a total of 209 cases with a wide range of syndromes (represented as various colors). Then, curated HPO terms and a VCF file of each case in both cohorts are imported into six ‘HPO + VCF’ prioritization methods. Additionally, curated HPO terms of each case are imported into five ‘HPO-only’ prioritization methods. In particular, AMELIE is run in both ‘HPO + VCF’ mode and ‘HPO-only’ mode(AMELIE_HPO). Finally, for each case, the ranking position of the known causal gene in the gene list output by each method is recorded, based on which the performance of each method is evaluated.
Figure 2
Figure 2
Distribution plots of performance evaluation results. Distribution plots of performance evaluation results of 10 phenotype-driven gene prioritization methods on the DDD (A) and KMCGD (B) datasets. The distribution plots illustrate the percentage of the cases with causal genes ranked in top-1 and within the top-5, -10, -20, -30, -40 and -50 by each method. Each method is represented by a different color.
Figure 3
Figure 3
CDF and bar plots of performance evaluation results. CDF plots (A) and bar plots (B) of performance evaluation results of 10 phenotype-driven gene prioritization methods on the DDD (left) and KMCGD (right) datasets. The CDF plots illustrate the percentage of the cases with causal genes ranked within the top k by each method. k could be any integer between 1 and 50 (inclusive). Each method is represented by a different color. The bar plots illustrate the relative proportion of each group involved cases with causal genes ranked within a designated range. Each group is represented by a different color. (C) The overlapping set of cases with causal genes ranked in top-1 and within top-5, -10, -20, -30, -40 and -50 by LIRICAL and AMELIE in DDD (left) and KMCGD (right) dataset.
Figure 4
Figure 4
Performance evaluation across different disease subgroups. (A) Frequency distribution of the HPO parent classes for the KMCGD dataset. The HPO terms of each case in the KMCGD dataset are assigned to HPO parent classes according to the official HPO hierarchy and some cases involve more than one kind of HPO parent class. (B) Disease subgroup composition of KMCGD dataset. Case amount and proportion are tagged for each subgroup. (C) CDF plots of performance evaluation results of 10 phenotype-driven gene prioritization methods on each subgroup of the KMCGD dataset.

References

    1. Baird PA, Anderson T, Newcombe H, et al. Genetic disorders in children and young adults: a population study. Am J Hum Genet 1988;42:677. - PMC - PubMed
    1. Ng SB, Buckingham KJ, Lee C, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 2010;42:30–5. - PMC - PubMed
    1. Clark MM, Stark Z, Farnaes L, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med 2018;3:1–10. - PMC - PubMed
    1. Chong JX, Buckingham KJ, Jhangiani SN, et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet 2015;97:199–215. - PMC - PubMed
    1. Boycott KM, Rath A, Chong JX, et al. International cooperation to enable the diagnosis of all rare genetic diseases. Am J Hum Genet 2017;100:695–705. - PMC - PubMed

Publication types