webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Meharji Arumilli^{1

2}, Ryan M Layer^{3

4}, Marjo K Hytönen^{1

2}, Hannes Lohi^{1

2}

Affiliations

¹ Department of Veterinary Biosciences, Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland.
² Genetics Research Program, The Folkhälsan Research Center, Helsinki, Finland.
³ Department of Computer Science, University of Colorado, Boulder, CO, United States.
⁴ The BioFrontiers Institute, University of Colorado, Boulder, CO, United States.

PMID: 32194629
PMCID: PMC7063093
DOI: 10.3389/fgene.2020.00152

webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Meharji Arumilli et al. Front Genet. 2020.

. 2020 Mar 3:11:152.

doi: 10.3389/fgene.2020.00152. eCollection 2020.

Authors

Meharji Arumilli^{1

2}, Ryan M Layer^{3

4}, Marjo K Hytönen^{1

2}, Hannes Lohi^{1

2}

Affiliations

¹ Department of Veterinary Biosciences, Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland.
² Genetics Research Program, The Folkhälsan Research Center, Helsinki, Finland.
³ Department of Computer Science, University of Colorado, Boulder, CO, United States.
⁴ The BioFrontiers Institute, University of Colorado, Boulder, CO, United States.

PMID: 32194629
PMCID: PMC7063093
DOI: 10.3389/fgene.2020.00152

Abstract

Summary: Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions.

Results: webGQT offers a flexible and easy-to-use interface for model-based candidate variant filtering for Mendelian diseases from thousands to millions of genomes at a reduced computation time. Additionally, webGQT provides adjustable parameters to reduce false positives and rescue missing genotypes across all modules. Using a case study, we demonstrate the applicability of webGQT to query non-human genomes. In addition, we demonstrate the scalability of webGQT on large data sets by implementing complex population-specific queries on the 1000 Genomes Project Phase 3 data set, which includes 8.4 billion variants from 2504 individuals across 26 different populations. Furthermore, webGQT supports filtering single-nucleotide variants, short insertions/deletions, copy number or any other variant genotypes supported by the VCF specification. Our results show that webGQT can be used as an online web service, or deployed on personal computers or local servers within research groups.

Availability: webGQT is made available to the users in three forms: 1) as a webserver available at https://vm1138.kaj.pouta.csc.fi/webgqt/, 2) as an R package to install on personal computers, and 3) as part of the same R package to configure on the user's own servers. The application is available for installation at https://github.com/arumds/webgqt.

Keywords: Bigdata; GQT; R package; filtering; shiny server; variant; webGQT.

PubMed Disclaimer

Figures

**Figure 1**
**(A)** An overview of the architecture of the webGQT system. The variant information is stored in as GQT index files. The user performs the query on the GQT index files from the GUI provided by the shiny server and the results are returned to the user via GUI. The whole application is secured with a Nginx front-end proxy server to serve https requests. **(B)** The three-step workflow of implementing webGQT is shown here: 1) selecting the default data set (e.g., 1000 Genomes) or uploading GQT indexed files, 2) uploading phenotype file (PED) and creating sample database, and 3) choosing a module and performing variant filtering.

See this image and copyright information in PMC

References

1. Akgun M., Demirci H. (2017). VCF-Explorer: filtering and analysing whole genome VCF files. Bioinformatics 33, 3468–3470. 10.1093/bioinformatics/btx422 - DOI - PubMed
1. Ameur A., Bunikis I., Enroth S., Gyllensten U. (2014). CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects. Database (Oxford) 2014, 1–10. 10.1093/database/bau098 - DOI - PMC - PubMed
1. Cingolani P., Platts A., Wang Le L., Coon M., Nguyen T., Wang L., et al. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92. 10.4161/fly.19695 - DOI - PMC - PubMed
1. Genomes Project C., Auton A., Brooks L. D., Durbin R. M., Garrison E. P., Kang H. M., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
1. Hart S. N., Duffy P., Quest D. J., Hossain A., Meiners M. A., Kocher J. P. (2016). VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Brief Bioinform. 17, 346–351. 10.1093/bib/bbv051 - DOI - PMC - PubMed

Grants and funding

R00 HG009532/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Affiliations

webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials