Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 7;15(1):68.
doi: 10.1186/s13073-023-01214-2.

ClinPrior: an algorithm for diagnosis and novel gene discovery by network-based prioritization

Collaborators, Affiliations

ClinPrior: an algorithm for diagnosis and novel gene discovery by network-based prioritization

Agatha Schlüter et al. Genome Med. .

Abstract

Background: Whole-exome sequencing (WES) and whole-genome sequencing (WGS) have become indispensable tools to solve rare Mendelian genetic conditions. Nevertheless, there is still an urgent need for sensitive, fast algorithms to maximise WES/WGS diagnostic yield in rare disease patients. Most tools devoted to this aim take advantage of patient phenotype information for prioritization of genomic data, although are often limited by incomplete gene-phenotype knowledge stored in biomedical databases and a lack of proper benchmarking on real-world patient cohorts.

Methods: We developed ClinPrior, a novel method for the analysis of WES/WGS data that ranks candidate causal variants based on the patient's standardized phenotypic features (in Human Phenotype Ontology (HPO) terms). The algorithm propagates the data through an interactome network-based prioritization approach. This algorithm was thoroughly benchmarked using a synthetic patient cohort and was subsequently tested on a heterogeneous prospective, real-world series of 135 families affected by hereditary spastic paraplegia (HSP) and/or cerebellar ataxia (CA).

Results: ClinPrior successfully identified causative variants achieving a final positive diagnostic yield of 70% in our real-world cohort. This includes 10 novel candidate genes not previously associated with disease, 7 of which were functionally validated within this project. We used the knowledge generated by ClinPrior to create a specific interactome for HSP/CA disorders thus enabling future diagnoses as well as the discovery of novel disease genes.

Conclusions: ClinPrior is an algorithm that uses standardized phenotype information and interactome data to improve clinical genomic diagnosis. It helps in identifying atypical cases and efficiently predicts novel disease-causing genes. This leads to increasing diagnostic yield, shortening of the diagnostic Odysseys and advancing our understanding of human illnesses.

Keywords: Algorithm; Candidate gene; Cerebellar ataxia; HPOs; Hereditary spastic paraplegia; Interactome; Variant prioritization; WES/WGS.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
ClinPrior pipeline. First, the algorithm calculates the phenotypic association metric for each gene in the phenotypic layer based on the patient’s phenotype and known HPO-gene associations. The multilayer network is built from different data resources (see “ Methods”). The phenotypic layer reports HPO-gene associations, the physical layer reports physical protein‒protein interactions (PPIs) and the functional layer provides coexpression, signalling or metabolic pathway, and protein domain associations. The method propagates the phenotypic metric in adjacent nodes of the network so that higher scores indicate a better phenotypic fit with the patient. Variants resulting from patient genomic sequencing are filtered by frequency, variant impact and mode of inheritance. With this method, new candidate genes not previously associated with disease can also be identified thanks to the propagation of the phenotypic metric through neighbourhood connections
Fig. 2
Fig. 2
ClinPrior performance benchmarking in a synthetic cohort. A Variant prioritization performance through the area under the receiver operating characteristic curve (AUROC) in the identification of known disease genes and candidate disease genes (A). ROC curves computed using the patient HPO terms, random HPO terms and random final ClinPrior prioritization rank in the 66,800 synthetic WES analysed. B The method identifies the gene that best matches the patient’s phenotype based on known HPO-gene associations and the propagation of the phenotypic metrics in the multilayer interactome. When the identified gene is a novel candidate gene not previously linked to disease, there are no HPO-gene associations in the phenotypic layer. For benchmarking, we simulated a candidate gene by removing the HPO-Gene associations from each candidate
Fig. 3
Fig. 3
Diagnostic process diagram and diagnostic yield in a patient real-world cohort. A Word cloud showing the most representative phenotypes in the 135 patients. B Number of cases included in the study and diagnostic process with C the diagnostic yield in global, WES, WGS (including CNVs) and RFC1 analysis
Fig. 4
Fig. 4
ClinPrior performance yield. ClinPrior performance yield in prioritizing 66,800 pathogenic variants and in a real-world patient cohort including 79 variants in known disease genes or candidates using bar plots (A) and CDF plots (B)
Fig. 5
Fig. 5
HSP/CA expanded interactome. A The HSP/CA seeds + expanded network was generated by the network prioritization tool, resulting in 2187 proteins. The seed genes known to be mutated in HSP/CA are shown in yellow circles, disease genes not previously associated with HSP/CA are shown in green, and new HSP/CA candidate genes are shown in blue. Comparison of the statistical connectivity strength of the HSP/CA expanded network with 1000 permutations of randomly selected proteins from the global human network. Red dots denote the value of the metric on the HSP/CA expanded network constituted by 2187 proteins. Box and whisker plots denote matched null distributions (i.e. 1000 permutations). (Left) Within-group edge count (i.e. number of edges between members of the query set). (Right) Mean distance is the average path length in the network obtained by calculating the shortest paths between all pairs of proteins. BF Zoom-in on the network for specific putative candidates as illustrative examples of the potential of the HSP/CA expanded network: B serine hydroxymethyltransferase 2 (SHMT2); C ubiquitin-associated protein 1 (UBAP1); D phosphate cytidylyltransferase 2, ethanolamine (PCYT2); E p2,4-dienoyl-CoA reductase 1 (DECR1); and F eukaryotic translation initiation Factor 2 subunit alpha (EIF2S1). * Indicates recently associated with HSP/CA

References

    1. Bamshad MJ, Nickerson DA, Chong JX. Mendelian Gene Discovery: fast and furious with no end in sight. Am J Hum Genet. 2019;105:448–455. - PMC - PubMed
    1. Schüle R, Wiethoff S, Martus P, Karle KN, Otto S, Klebe S, et al. Hereditary spastic paraplegia: clinicogenetic lessons from 608 patients. Ann Neurol. 2016;79:646. - PubMed
    1. Jacobsen JOB, Kelly C, Cipriani V, Research Consortium GE, Mungall CJ, Reese J, et al. Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat. 2022; Available from: https://pubmed.ncbi.nlm.nih.gov/35391505/ Cited 10 May 2022 - PMC - PubMed
    1. Yuan X, Wang J, Dai B, Sun Y, Zhang K, Chen F, et al. Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases. Brief Bioinform. 2022;23. Available from: https://pubmed.ncbi.nlm.nih.gov/35134823/ Cited 10 May 2022 - PMC - PubMed
    1. Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2019;47:1038. - PMC - PubMed

Publication types

LinkOut - more resources