Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 3:3:e796.
doi: 10.7717/peerj.796. eCollection 2015.

VaRank: a simple and powerful tool for ranking genetic variants

Affiliations

VaRank: a simple and powerful tool for ranking genetic variants

Véronique Geoffroy et al. PeerJ. .

Abstract

Background. Most genetic disorders are caused by single nucleotide variations (SNVs) or small insertion/deletions (indels). High throughput sequencing has broadened the catalogue of human variation, including common polymorphisms, rare variations or disease causing mutations. However, identifying one variation among hundreds or thousands of others is still a complex task for biologists, geneticists and clinicians. Results. We have developed VaRank, a command-line tool for the ranking of genetic variants detected by high-throughput sequencing. VaRank scores and prioritizes variants annotated either by Alamut Batch or SnpEff. A barcode allows users to quickly view the presence/absence of variants (with homozygote/heterozygote status) in analyzed samples. VaRank supports the commonly used VCF input format for variants analysis thus allowing it to be easily integrated into NGS bioinformatics analysis pipelines. VaRank has been successfully applied to disease-gene identification as well as to molecular diagnostics setup for several hundred patients. Conclusions. VaRank is implemented in Tcl/Tk, a scripting language which is platform-independent but has been tested only on Unix environment. The source code is available under the GNU GPL, and together with sample data and detailed documentation can be downloaded from http://www.lbgi.fr/VaRank/.

Keywords: Annotation; Barcode; Human genetics; Molecular diagnostic; Mutation detection; Next generation sequencing; Software; Variant ranking.

PubMed Disclaimer

Conflict of interest statement

Co-author André Blavier declares financial competing interest as a member of Interactive Biosoftware, the company commercializing Alamut Batch.

Figures

Figure 1
Figure 1. High throughput sequencing data analysis workflow and VaRank positioning.
Figure 2
Figure 2. VaRank’s workflow.
The work flow is separated into 4 major steps, (i) Sequencing data from a single or from multiple VCF files are integrated including variant call quality summary, (ii) Annotation of each variant including genetic and predictive information (functional impact, putative effects in protein coding regions, population frequency, phenotypic features…) from different sources. The annotation can either be done by Alamut Batch or SnpEff. (iii) Presence/absence of variants (with homozygote/heterozygote status) within all samples represented in a barcode, and (iv) Prioritization, to score and rank variants according to their predicted pathogenic status. The final output files are available for each samples.
Figure 3
Figure 3. Barcode.
(A) The barcode represents the SNV’s zygosity status in an ordered list of samples. Samples that are homozygous for the reference allele are represented using “0,” heterozygous variants are represented using “1” and homozygous variants are represented with “2.” (B) Selected annotations from the VaRank output representing 3 variants from a single patient. The barcode gives an overview of the presence/absence of one variant in all other patients analyzed. The family barcode gives a user ordered view of the presence/absence of one variant in a selection of patients. Together with this, the total counts of alleles are given in the last 4 columns. (C) Example of pedigrees and barcodes that can be specifically used in family analyses such as trio exome sequencing. On the left, homozygous mutations in a consanguineous family could be highlighted by the “121” barcode indicating homozygous variants (“2”) in the proband inherited from heterozygous parents (“1”). On the right de novo variants in the proband could be highlighted with the proposed barcode “010.”
Figure 4
Figure 4. Distribution of variants in 180 patients for 217 genes.
The gray line represents the distribution of the number of variants identified in each sample in a cohort of 180 patients sequenced for 217 genes. The dark line represents the cumulative number of non-redundant (NR) variants in the same dataset due to each new sample added.
Figure 5
Figure 5. Representation of the non-redundant variations by functional type in 3 datasets.
The chart is built upon the Intellectual disability (ID) and Bardet-Biedl Syndrome (BBS) (consolidated from 188 patients addressed for BBS) datasets discussed in the Results section together with an enhanced exome dataset (35 exomes). The “truncating” category corresponds to frameshift, nonsense, stoploss and startloss.

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nature Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. - DOI - PMC - PubMed
    1. Aldahmesh MA, Li Y, Alhashem A, Anazi S, Alkuraya H, Hashem M, Awaji AA, Sogaty S, Alkharashi A, Alzahrani S, Al Hazzaa SA, Xiong Y, Kong S, Sun Z, Alkuraya FS. IFT27, encoding a small GTPase component of IFT particles, is mutated in a consanguineous family with Bardet-Biedl syndrome. Human Molecular Genetics. 2014;23:3307–3315. doi: 10.1093/hmg/ddu044. - DOI - PMC - PubMed
    1. Bao R, Huang L, Andrade J, Tan W, Kibbe WA, Jiang H, Feng G. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informatics. 2014;13:67–82. doi: 10.4137/CIN.S13779. - DOI - PMC - PubMed
    1. Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i) BMC Bioinformatics. 2014;15:111. doi: 10.1186/1471-2105-15-111. - DOI - PMC - PubMed
    1. Chatterjee S, Pal JK. Role of 5′- and 3′-untranslated regions of mRNAs in human diseases. Biology of the Cell. 2009;101:251–262. doi: 10.1042/BC20080104. - DOI - PubMed

LinkOut - more resources