. 2022 Feb 18;15(1):31.

doi: 10.1186/s12920-022-01166-3.

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Roozbeh Manshaei¹, Sean DeLong^{1

2}, Veronica Andric^#¹, Esha Joshi^#^{1

3}, John B A Okello^{1

4}, Priya Dhir^{1

5}, Cherith Somerville¹, Kirsten M Farncombe⁶, Kelsey Kalbfleisch^{1

7}, Rebekah K Jobling^{1

8

7}, Stephen W Scherer^{9

10

11

12}, Raymond H Kim^{13

14}, S Mohsen Hosseini¹⁵

Affiliations

¹ Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.
² Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada.
³ Department of Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
⁴ MIT Sloan School of Management, Massachusetts Institute of Technology, 100 Main Street, Cambridge, MA, 02142, USA.
⁵ Faculty of Medicine, University of Toronto, Toronto, ON, M5S1A8, Canada.
⁶ Ted Rogers Centre for Heart Research, Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada.
⁷ Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada.
⁸ Genome Diagnostics, Department of Pediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada.
⁹ The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.
¹⁰ Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
¹¹ Centre for Genetic Medicine, The Hospital for Sick Children, Toronto, ON, Canada.
¹² Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
¹³ Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada. raymond.kim@sickkids.ca.
¹⁴ Fred A. Litwin Family Centre in Genetic Medicine, University Health Network, Department of Medicine, University of Toronto, Toronto, ON, Canada. raymond.kim@sickkids.ca.
¹⁵ Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. smhosseini@mdanderson.org.

^# Contributed equally.

PMID: 35180879
PMCID: PMC8857790
DOI: 10.1186/s12920-022-01166-3

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Roozbeh Manshaei et al. BMC Med Genomics. 2022.

. 2022 Feb 18;15(1):31.

doi: 10.1186/s12920-022-01166-3.

Authors

Affiliations

¹ Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.
² Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada.
³ Department of Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
⁴ MIT Sloan School of Management, Massachusetts Institute of Technology, 100 Main Street, Cambridge, MA, 02142, USA.
⁵ Faculty of Medicine, University of Toronto, Toronto, ON, M5S1A8, Canada.
⁶ Ted Rogers Centre for Heart Research, Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada.
⁷ Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada.
⁸ Genome Diagnostics, Department of Pediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada.
⁹ The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.
¹⁰ Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
¹¹ Centre for Genetic Medicine, The Hospital for Sick Children, Toronto, ON, Canada.
¹² Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
¹³ Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada. raymond.kim@sickkids.ca.
¹⁴ Fred A. Litwin Family Centre in Genetic Medicine, University Health Network, Department of Medicine, University of Toronto, Toronto, ON, Canada. raymond.kim@sickkids.ca.
¹⁵ Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. smhosseini@mdanderson.org.

^# Contributed equally.

PMID: 35180879
PMCID: PMC8857790
DOI: 10.1186/s12920-022-01166-3

Abstract

Background: Variant interpretation is the main bottleneck in medical genomic sequencing efforts. This usually involves genome analysts manually searching through a multitude of independent databases, often with the aid of several, mostly independent, computational tools. To streamline variant interpretation, we developed the GeneTerpret platform which collates data from current interpretation tools and databases, and applies a phenotype-driven query to categorize the variants identified in the genome(s). The platform assigns quantitative validity scores to genes by query and assembly of the genotype-phenotype data, sequence homology, molecular interactions, expression data, and animal models. It also uses the American College of Medical Genetics and Genomics (ACMG) criteria to categorize variants into five tiers of pathogenicity. The final output is a prioritized list of potentially causal variants/genes.

Results: We tested GeneTerpret by comparing its performance to expert-curated genes (ClinGen's gene-validity database) and variant pathogenicity reports (DECIPHER database). Output from GeneTerpret was 97.2% and 83.5% concordant with the expert-curated sources, respectively. Additionally, similar concordance was observed when GeneTerpret's performance was compared with our internal expert-interpreted clinical datasets.

Conclusions: GeneTerpret is a flexible platform designed to streamline the genome interpretation process, through a unique interface, with improved ease, speed and accuracy. This modular and customizable system allows the user to tailor the component-programs in the analysis process to their preference. GeneTerpret is available online at https://geneterpret.com .

Keywords: Bioinformatic application; Causative variants; Disease gene validity; Gene prioritization; Genome interpretation; Genomic variants; Genotype–phenotype correlation; Variant pathogenicity.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
*GeneTerpret* workflow. The figure depicts the modules, their feeding databases, acceptable inputs, and the flow of information in the workflow. The main modules feed into the gene validity, *VIP*, and causality modules. Three sets of modules are available within *GeneTerpret* for gene validity exploration; (1) ExPhenosion module—accepts the phenotype as input; the number of super-classes to walk up can be customized; and outputs the connected phenotypes and their associated genes. This module works independently from other developed modules to extract the connected phenotypes to the selected phenotype and allow the analyst to explore the genes associated with related phenotypes; (2) CanGene modules—generate a list of candidate genes by compiling various types of evidence. The cross-species (zebrafish and mouse) modules accept the disease(s) by its/their MONDO ID(s) as the input(s) and generate a list of genes that their orthologue is associated with similar disease in animal models by checking the related databases. The Homology and Protein–Protein Interaction modules accept a list of known genes for a phenotype (if it is available); the Homology module returns the homologous genes (paralogues) to the genes in the known gene list. The Protein–Protein interaction module takes a similar approach to generate a list of genes that interact with the known disease genes. The analyst can select the number of interaction neighborhood levels (such as level-1, level-2, etc.) desired for this interpretation. The Gene Expression module accepts a list of relevant tissues as input and outputs the list of genes expressed in the selected tissue based on the expression cut-off threshold which is set by the analyst; (3) *KING* module—accepts a disease(s) (MONDO ID(s)) as the input and then outputs a list of genes associated with the said disease based on evidence obtained from Orphanet, OMIM, ClinVar, and MedGen databases. The validity module accepts the generated gene lists from the modules CanGene and *KING*, as well as ANNOVAR, annotated VCF file or the output of VIP module as an input. The output file is the VCF file including validity scores. VIP module has been developed based on ACMG guidelines . This module annotates the variants with pathogenicity terms (PVS1, PS1, etc.) and justifies the assigned terms. The causality module integrates the output of validity and VIP modules and ranks the variants based on the number of evidence extracted from validity modules and pathogenicity terms from VIP. Simultaneously, an interactive graphical representation of the variants is generated which allows the analyst to select the desired variants by using a LASSO filter

**Fig. 2**
Graphical representation of the results from an analysis of internal datasets by *GeneTerpret* and manual interpretation. A The top hundred of ranked variants from the family-based analysis of ten families are represented. The red colour is highlighting the variant of interest (VOI) selected by a human analyst as published before . The boxes around the variants cluster the same ranked variants by *GeneTerpret* (the same pathogenicity and validity terms). B The cohort-based results for 20 unrelated probands with “Tetralogy of Fallot”. The top hundred ranked variants are plotted as circles from top to bottom. The only five VOIs selected by a human genome analyst in five patients from this cohort are highlighted in colours. Different colours have been selected to distinguish the VOI related to each patient. For comparison, individual analysis of genomes from the five probands with VOIs are also plotted using the same colour-coding. For instance, the purple colour represents the obtained VOI for patient TOF53 (one of the probands in the cohort). This variant is ranked 44 in the cohort-based analysis and ranked 8 in the singleton-based analysis by *GeneTerpret*

See this image and copyright information in PMC

References

1. Priest JR. A primer to clinical genome sequencing. Curr Opin Pediatr. 2017;29(5):513–519. doi: 10.1097/MOP.0000000000000532. - DOI - PMC - PubMed
1. Yang S, Lincoln SE, Kobayashi Y, Nykamp K, Nussbaum RL, Topper S. Sources of discordance among germ-line variant classifications in ClinVar. Genet Med. 2017;19(10):1118–1126. doi: 10.1038/gim.2017.60. - DOI - PMC - PubMed
1. Kamphans T, Krawitz PM. GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes. Bioinformatics. 2012;28(19):2515–2516. doi: 10.1093/bioinformatics/bts462. - DOI - PMC - PubMed
1. Sifrim A, Popovic D, Tranchevent LC, et al. EXtasy: variant prioritization by genomic data fusion. Nat Methods. 2013;10(11):1083–1086. doi: 10.1038/nmeth.2656. - DOI - PubMed
1. Javed A, Agrawal S, Ng PC. Phen-gen: combining phenotype and genotype to analyze rare disorders. Nat Methods. 2014;11(9):935–937. doi: 10.1038/nmeth.3046. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Affiliations

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources