Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 18;15(1):31.
doi: 10.1186/s12920-022-01166-3.

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Affiliations

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Roozbeh Manshaei et al. BMC Med Genomics. .

Abstract

Background: Variant interpretation is the main bottleneck in medical genomic sequencing efforts. This usually involves genome analysts manually searching through a multitude of independent databases, often with the aid of several, mostly independent, computational tools. To streamline variant interpretation, we developed the GeneTerpret platform which collates data from current interpretation tools and databases, and applies a phenotype-driven query to categorize the variants identified in the genome(s). The platform assigns quantitative validity scores to genes by query and assembly of the genotype-phenotype data, sequence homology, molecular interactions, expression data, and animal models. It also uses the American College of Medical Genetics and Genomics (ACMG) criteria to categorize variants into five tiers of pathogenicity. The final output is a prioritized list of potentially causal variants/genes.

Results: We tested GeneTerpret by comparing its performance to expert-curated genes (ClinGen's gene-validity database) and variant pathogenicity reports (DECIPHER database). Output from GeneTerpret was 97.2% and 83.5% concordant with the expert-curated sources, respectively. Additionally, similar concordance was observed when GeneTerpret's performance was compared with our internal expert-interpreted clinical datasets.

Conclusions: GeneTerpret is a flexible platform designed to streamline the genome interpretation process, through a unique interface, with improved ease, speed and accuracy. This modular and customizable system allows the user to tailor the component-programs in the analysis process to their preference. GeneTerpret is available online at https://geneterpret.com .

Keywords: Bioinformatic application; Causative variants; Disease gene validity; Gene prioritization; Genome interpretation; Genomic variants; Genotype–phenotype correlation; Variant pathogenicity.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
GeneTerpret workflow. The figure depicts the modules, their feeding databases, acceptable inputs, and the flow of information in the workflow. The main modules feed into the gene validity, VIP, and causality modules. Three sets of modules are available within GeneTerpret for gene validity exploration; (1) ExPhenosion module—accepts the phenotype as input; the number of super-classes to walk up can be customized; and outputs the connected phenotypes and their associated genes. This module works independently from other developed modules to extract the connected phenotypes to the selected phenotype and allow the analyst to explore the genes associated with related phenotypes; (2) CanGene modules—generate a list of candidate genes by compiling various types of evidence. The cross-species (zebrafish and mouse) modules accept the disease(s) by its/their MONDO ID(s) as the input(s) and generate a list of genes that their orthologue is associated with similar disease in animal models by checking the related databases. The Homology and Protein–Protein Interaction modules accept a list of known genes for a phenotype (if it is available); the Homology module returns the homologous genes (paralogues) to the genes in the known gene list. The Protein–Protein interaction module takes a similar approach to generate a list of genes that interact with the known disease genes. The analyst can select the number of interaction neighborhood levels (such as level-1, level-2, etc.) desired for this interpretation. The Gene Expression module accepts a list of relevant tissues as input and outputs the list of genes expressed in the selected tissue based on the expression cut-off threshold which is set by the analyst; (3) KING module—accepts a disease(s) (MONDO ID(s)) as the input and then outputs a list of genes associated with the said disease based on evidence obtained from Orphanet, OMIM, ClinVar, and MedGen databases. The validity module accepts the generated gene lists from the modules CanGene and KING, as well as ANNOVAR, annotated VCF file or the output of VIP module as an input. The output file is the VCF file including validity scores. VIP module has been developed based on ACMG guidelines . This module annotates the variants with pathogenicity terms (PVS1, PS1, etc.) and justifies the assigned terms. The causality module integrates the output of validity and VIP modules and ranks the variants based on the number of evidence extracted from validity modules and pathogenicity terms from VIP. Simultaneously, an interactive graphical representation of the variants is generated which allows the analyst to select the desired variants by using a LASSO filter
Fig. 2
Fig. 2
Graphical representation of the results from an analysis of internal datasets by GeneTerpret and manual interpretation. A The top hundred of ranked variants from the family-based analysis of ten families are represented. The red colour is highlighting the variant of interest (VOI) selected by a human analyst as published before . The boxes around the variants cluster the same ranked variants by GeneTerpret (the same pathogenicity and validity terms). B The cohort-based results for 20 unrelated probands with “Tetralogy of Fallot”. The top hundred ranked variants are plotted as circles from top to bottom. The only five VOIs selected by a human genome analyst in five patients from this cohort are highlighted in colours. Different colours have been selected to distinguish the VOI related to each patient. For comparison, individual analysis of genomes from the five probands with VOIs are also plotted using the same colour-coding. For instance, the purple colour represents the obtained VOI for patient TOF53 (one of the probands in the cohort). This variant is ranked 44 in the cohort-based analysis and ranked 8 in the singleton-based analysis by GeneTerpret

References

    1. Priest JR. A primer to clinical genome sequencing. Curr Opin Pediatr. 2017;29(5):513–519. doi: 10.1097/MOP.0000000000000532. - DOI - PMC - PubMed
    1. Yang S, Lincoln SE, Kobayashi Y, Nykamp K, Nussbaum RL, Topper S. Sources of discordance among germ-line variant classifications in ClinVar. Genet Med. 2017;19(10):1118–1126. doi: 10.1038/gim.2017.60. - DOI - PMC - PubMed
    1. Kamphans T, Krawitz PM. GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes. Bioinformatics. 2012;28(19):2515–2516. doi: 10.1093/bioinformatics/bts462. - DOI - PMC - PubMed
    1. Sifrim A, Popovic D, Tranchevent LC, et al. EXtasy: variant prioritization by genomic data fusion. Nat Methods. 2013;10(11):1083–1086. doi: 10.1038/nmeth.2656. - DOI - PubMed
    1. Javed A, Agrawal S, Ng PC. Phen-gen: combining phenotype and genotype to analyze rare disorders. Nat Methods. 2014;11(9):935–937. doi: 10.1038/nmeth.3046. - DOI - PubMed

Publication types

LinkOut - more resources