Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 3:11:152.
doi: 10.3389/fgene.2020.00152. eCollection 2020.

webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Affiliations

webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Meharji Arumilli et al. Front Genet. .

Abstract

Summary: Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions.

Results: webGQT offers a flexible and easy-to-use interface for model-based candidate variant filtering for Mendelian diseases from thousands to millions of genomes at a reduced computation time. Additionally, webGQT provides adjustable parameters to reduce false positives and rescue missing genotypes across all modules. Using a case study, we demonstrate the applicability of webGQT to query non-human genomes. In addition, we demonstrate the scalability of webGQT on large data sets by implementing complex population-specific queries on the 1000 Genomes Project Phase 3 data set, which includes 8.4 billion variants from 2504 individuals across 26 different populations. Furthermore, webGQT supports filtering single-nucleotide variants, short insertions/deletions, copy number or any other variant genotypes supported by the VCF specification. Our results show that webGQT can be used as an online web service, or deployed on personal computers or local servers within research groups.

Availability: webGQT is made available to the users in three forms: 1) as a webserver available at https://vm1138.kaj.pouta.csc.fi/webgqt/, 2) as an R package to install on personal computers, and 3) as part of the same R package to configure on the user's own servers. The application is available for installation at https://github.com/arumds/webgqt.

Keywords: Bigdata; GQT; R package; filtering; shiny server; variant; webGQT.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) An overview of the architecture of the webGQT system. The variant information is stored in as GQT index files. The user performs the query on the GQT index files from the GUI provided by the shiny server and the results are returned to the user via GUI. The whole application is secured with a Nginx front-end proxy server to serve https requests. (B) The three-step workflow of implementing webGQT is shown here: 1) selecting the default data set (e.g., 1000 Genomes) or uploading GQT indexed files, 2) uploading phenotype file (PED) and creating sample database, and 3) choosing a module and performing variant filtering.
Figure 2
Figure 2
Figure showing the work flow of webGQT via user interface: (1) interface showing the user upload panel for input variant data. The user can choose the default data set or upload GQT indexed files. (2a) Interface showing the user upload panel of the phenotype file. The user uploads the PED file by clicking “Browse” button. (2b) After uploading, the phenotype file is rendered as data table with the sample selection information. The user is then required to create a phenotype sample database by clicking “CreateDB” and (3) The user choses a filtering module and applies the available parameters of the corresponding module and finally filters the variants. A dominant analysis module filter is shown in the figure.

References

    1. Akgun M., Demirci H. (2017). VCF-Explorer: filtering and analysing whole genome VCF files. Bioinformatics 33, 3468–3470. 10.1093/bioinformatics/btx422 - DOI - PubMed
    1. Ameur A., Bunikis I., Enroth S., Gyllensten U. (2014). CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects. Database (Oxford) 2014, 1–10. 10.1093/database/bau098 - DOI - PMC - PubMed
    1. Cingolani P., Platts A., Wang Le L., Coon M., Nguyen T., Wang L., et al. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92. 10.4161/fly.19695 - DOI - PMC - PubMed
    1. Genomes Project C., Auton A., Brooks L. D., Durbin R. M., Garrison E. P., Kang H. M., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Hart S. N., Duffy P., Quest D. J., Hossain A., Meiners M. A., Kocher J. P. (2016). VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Brief Bioinform. 17, 346–351. 10.1093/bib/bbv051 - DOI - PMC - PubMed