Resources and tools for rare disease variant interpretation

Luana Licata¹, Allegra Via², Paola Turina³, Giulia Babbi³, Silvia Benevenuta⁴, Claudio Carta⁵, Rita Casadio³, Andrea Cicconardi^{6

7}, Angelo Facchiano⁸, Piero Fariselli⁴, Deborah Giordano⁸, Federica Isidori⁹, Anna Marabotti¹⁰, Pier Luigi Martelli³, Stefano Pascarella², Michele Pinelli¹¹, Tommaso Pippucci⁹, Roberta Russo^{11

12}, Castrense Savojardo³, Bernardina Scafuri¹⁰, Lucrezia Valeriani¹³, Emidio Capriotti³

Affiliations

¹ Department of Biology, University of Rome Tor Vergata, Roma, Italy.
² Department of Biochemical Sciences "A. Rossi Fanelli", University of Rome "La Sapienza", Roma, Italy.
³ Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
⁴ Department of Medical Sciences, University of Torino, Torino, Italy.
⁵ National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy.
⁶ Department of Physics, University of Genova, Genova, Italy.
⁷ Italiano di Tecnologia-IIT, Genova, Italy.
⁸ National Research Council, Institute of Food Science, Avellino, Italy.
⁹ Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy.
¹⁰ Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy.
¹¹ Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy.
¹² CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy.
¹³ Center for Technology and Innovation, Trieste, Italy.

PMID: 37234922
PMCID: PMC10206239
DOI: 10.3389/fmolb.2023.1169109

Review

Resources and tools for rare disease variant interpretation

Luana Licata et al. Front Mol Biosci. 2023.

. 2023 May 10:10:1169109.

doi: 10.3389/fmolb.2023.1169109. eCollection 2023.

Authors

Affiliations

¹ Department of Biology, University of Rome Tor Vergata, Roma, Italy.
² Department of Biochemical Sciences "A. Rossi Fanelli", University of Rome "La Sapienza", Roma, Italy.
³ Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
⁴ Department of Medical Sciences, University of Torino, Torino, Italy.
⁵ National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy.
⁶ Department of Physics, University of Genova, Genova, Italy.
⁷ Italiano di Tecnologia-IIT, Genova, Italy.
⁸ National Research Council, Institute of Food Science, Avellino, Italy.
⁹ Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy.
¹⁰ Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy.
¹¹ Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy.
¹² CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy.
¹³ Center for Technology and Innovation, Trieste, Italy.

PMID: 37234922
PMCID: PMC10206239
DOI: 10.3389/fmolb.2023.1169109

Abstract

Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.

Keywords: genetic disorder; genome interpretation; genotype-phenotype association; machine learning; precision medicine; rare disease; single nucleotide variant (SNV).

Copyright © 2023 Licata, Via, Turina, Babbi, Benevenuta, Carta, Casadio, Cicconardi, Facchiano, Fariselli, Giordano, Isidori, Marabotti, Martelli, Pascarella, Pinelli, Pippucci, Russo, Savojardo, Scafuri, Valeriani and Capriotti.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Analysis of the Orphanet database composition. **(A)** Fraction of genetic and nongenetic RDs in the different classes. **(B)** Plot showing the fraction of genes shared by each RD-class pair. Genes and Orphanet codes can be found to be associated with multiple RD classes.

**FIGURE 2**
Performance of four state-of-the-art methods (CADD, FATHMM, PhD-SNP^g and VEST4) on a dataset of RD-associated variants from ClinVar database, featuring at least one annotation as Pathogenic or Benign. The scores are calculated for the different classes of RDs. All 27,648 variants (16,012 Pathogenic and 11,633 Benign) in our dataset are in the Genetic class. According to the Clinvar annotation, each variant can be classified in multiple RD groups. Performance parameters shown are: Q2, Overall Accuracy; MCC, Matthews Correlation Coefficient; AUC, Area Under the receiver operator characteristic Curve. DB indicates the fraction of each RD group in the dataset. The performance of CADD was calculated considering a Phred-like score threshold of 20. The color darkness in the drawing is proportional to the numerical values, which are reported in Supplementary Table S6. The predictions of the four methods are reported in Supplementary Materials.

**FIGURE 3**
Exome analysis flowchart. A diagram of the main steps of NGS data analysis is shown. On the left, the progressive reduction by filtering in the number of likely disease-causing variants is shown, for a general patient case. The reported numbers are from a typical single patient case. On the right, the filtering process is detailed. Based on the identified variants, we can recognize three different diagnostic situations: (1, green dot) identification of P/LP variants with well-established association to RD phenotype; (2, yellow dot) identification of new P/LP variants in genes with known association to the phenotype; (3, red dot) identification of functional variants in genes with unknown association with the phenotype. A fourth case should be considered, i.e., the identification of VUS variants in genes with unknown association with the phenotype. In this case, complementing different approaches, such as short-read genome sequencing with RNA sequencing, and methyl profiling, should be considered to elucidate the molecular mechanism of the disease and improve the diagnostic yield.

**FIGURE 4**
Schematic view of the clinical variant interpretation process. In a human protein-coding gene, a variant in the exons of an open reading frame can result in synonymous or nonsynonymous changes, while a variant in other areas (splice or intronic regions) can impact on splicing regulation. Changes within regulatory sequences (yellow and blue) can affect transcription and translation regulation of gene expression. On the right column, a selection of the most commonly used resources for variant interpretation is reported, distinguished by their gene location. Several methods are currently available to predict the effect of coding variants, however the interpretation of variants in deep-intronic regions or in regulatory elements is still challenging, due to the limited number of *in silico* prediction approaches. Such shortcomings can be overcome by parallel sequence analysis of the whole exome/genome together with multi-omics technologies, including RNA sequencing (transcriptome analysis), ChIP-seq (chromatin immunoprecipitation assay) and HiC (high-throughput chromosome conformation capture).

See this image and copyright information in PMC

References

1. 100,000 Genomes Project Pilot Investigators Smedley D., Smith K. R., Martin A., Thomas E. A., McDonagh E. M., et al. (2021). 100,000 genomes pilot on rare-disease diagnosis in health care - preliminary report. N. Engl. J. Med. 385, 1868–1880. 10.1056/NEJMoa2035790 - DOI - PMC - PubMed
1. 1000 Genomes Project Consortium Auton A., Brooks L. D., Durbin R. M., Garrison E. P., Kang H. M., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
1. Adzhubei I., Jordan D. M., Sunyaev S. R. (2013). Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit7.20. 10.1002/0471142905.hg0720s76 - DOI - PMC - PubMed
1. Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010). Galaxy CloudMan: Delivering cloud compute clusters. BMC Bioinforma. 11, S4. 10.1186/1471-2105-11-S12-S4 - DOI - PMC - PubMed
1. Afgan E., Baker D., Batut B., van den Beek M., Bouvier D., Cech M., et al. (2018). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46, W537–W544. 10.1093/nar/gky379 - DOI - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Resources and tools for rare disease variant interpretation

Affiliations

Resources and tools for rare disease variant interpretation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources