Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 29;40(4):btae184.
doi: 10.1093/bioinformatics/btae184.

Prioritization of oligogenic variant combinations in whole exomes

Affiliations

Prioritization of oligogenic variant combinations in whole exomes

Barbara Gravel et al. Bioinformatics. .

Abstract

Motivation: Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion.

Results: We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores.

Availability and implementation: Hop is available at https://github.com/oligogenic/HOP.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overview of the developed prioritization method. (A) Synthetic patients’ data are generated by inserting known pathogenic combinations from OLIDA in exomes of the 1KGP and UK10K, filtering the variants based on the VarCoPP2.0 filtering criteria, and annotating them with disease-related phenotypic terms and gene panels, based on the disease associated to the inserted OLIDA combination. (B) All possible variant combinations of variants between two genes are then generated based on the filtered VCF file, and predicted using VarCoPP2.0, resulting in the attribution of a pathogenicity score (PS) per variant combination. (C) A disease-relevance score (DS) is attributed to all possible gene pairs, by running a random-walk with restart algorithm in BOCK, a heterogeneous network, using the disease-related terms as seeds for the propagation. This algorithm scores the proximity of all genes to the seeds, and the gene-pair score is computed as the average of the two gene scores. (D) The PS and DS scores are scaled using min–max normalization per exome to have equal weight in the computation of the FinalScore (FS), which is the average of the aforementioned scores. The combinations are then ranked based on this FS, with the highest ranked combination having the highest FS.
Figure 2.
Figure 2.
Performance of Hop in the cross-validation exomes. (A) Cumulative Density Function (CDF) plot of the rankings obtained in the cross-validation exomes, by using the Final Score (FS), Disease-relevance Score (DS), and Pathogenicity Score (PS) as ranking scores, with HPO terms as seeds (dashed line), genes from a gene panel as seeds (dotted line) and both HPOs and gene panel as seeds (solid line). The CDF plots illustrate the percentage of exomes for which the OLIDA combination is ranked in the top K by each method, with K varying between 1 and 50 (inclusive). (B) Percentage of exomes for which the known OLIDA combination is ranked in the top 1, top 10, top 20, and top 50 of the cross-validation exomes based on the different types of scores and seeds for ranking.
Figure 3.
Figure 3.
Performance Hop in the independent validation exomes. (A) Cumulative Density Function (CDF) plot of the rankings obtained in the independent validation exomes, by using the Final Score (FS), Disease-relevance Score (DS), and Pathogenicity Score (PS) as ranking scores, with HPO terms as seeds (dashed line), genes from a gene panel as seeds (dotted line) and both HPOs and gene panel as seeds (solid line). The Cumulative Density Function (CDF) plots illustrate the percentage of exomes for which the OLIDA combination is ranked in the top K by each method, with K varied between 1 and 50 (inclusive). (B) Percentage of exomes for which the known OLIDA combination is ranked in the top 1, top 10, top 20, and top 50 of the independent validation exomes based on the different types of scores and seeds for ranking.
Figure 4.
Figure 4.
Percentage of exomes for which the OLIDA combinations is ranked in the Top 1, Top 10 and Top 50 instances, when using CADD, Exomiser, OligoPVP, and Hop for prioritization.

Similar articles

Cited by

References

    1. Afgan E, Baker D, van den Beek M. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 2016;44:W3–10. - PMC - PubMed
    1. Agrawal M, Zitnik M, Leskovec J.. Large-scale analysis of disease pathways in the human interactome. Pac Symp Biocomput 2018:111–22. - PMC - PubMed
    1. Auton A, Brooks LD, Durbin RM. et al.; 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015;526:68–74. - PMC - PubMed
    1. Birgmeier J, Haeussler M, Deisseroth CA. et al. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 2020;12:eaau9113. - PMC - PubMed
    1. Boudellioua I, Kulmanov M, Schofield PN. et al. OligoPVP: phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants. Sci Rep 2018;8:14681. - PMC - PubMed

Publication types