Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 27;10(2):e0256421.
doi: 10.1128/spectrum.02564-21. Epub 2022 Mar 2.

VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data

Affiliations

VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data

Darlene D Wagner et al. Microbiol Spectr. .

Abstract

Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.

Keywords: automated bioinformatics pipeline; infectious disease surveillance; next-generation sequencing (NGS); viral molecular detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
VPipe standard analysis pipeline: VPipe takes raw FASTQ data generated by Illumina or Ion Torrent sequencing instruments. Raw reads are processed using the Read Quality and Trimming module prior to de novo assembly using SPAdes and detection of viral contigs via BLASTN. Analysis results are available on the VPipe user interface, accessible through the CDC OAMD portal. For SARS-CoV-2 data sets, reference-based assembly is also run in parallel with the de novo Assembly Module.
FIG 2
FIG 2
Distribution of longest assembled contigs for HRSV and SARS-CoV-2 clinical data sets. (A) Bar plots indicating the average longest contigs assembled for 41 human respiratory syncytial virus (HRSV) samples. From left to right, bars represent average maximum contig lengths (bp) from Agoti et al. (2015), representing manually curated assemblies (28): EDGE with host sequence removal, EDGE with reads preprocessed through VPipe, Genome Detective, and VPipe. (B) Bar plots indicating the average longest contigs assembled for 23 SARS-CoV-2 specimens using EDGE with host sequence removal, EDGE with reads preprocessed through VPipe, Genome Detective, and VPipe. Whiskers represent standard error of the mean. Bar plots with the same letter are statistically equivalent (pairwise Wilcoxon’s test).

References

    1. Dunne WM, Jr., Westblade LF, Ford B. 2012. Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. Eur J Clin Microbiol Infect Dis 31:1719–1726. doi: 10.1007/s10096-012-1641-7. - DOI - PubMed
    1. Maljkovic Berry I, Melendrez MC, Bishop-Lilly KA, Rutvisuttinunt W, Pollett S, Talundzic E, Morton L, Jarman RG. 2020. Next generation sequencing and bioinformatics methodologies for infectious disease research and public health: approaches, applications, and considerations for development of laboratory capacity. J Infect Dis 221:S292–S307. doi: 10.1093/infdis/jiz286. - DOI - PubMed
    1. Charre C, Ginevra C, Sabatier M, Regue H, Destras G, Brun S, Burfin G, Scholtes C, Morfin F, Valette M, Lina B, Bal A, Josset L. 2020. Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation. Virus Evol 6:veaa075. doi: 10.1093/ve/veaa075. - DOI - PMC - PubMed
    1. Oakeson KF, Wagner JM, Mendenhall M, Rohrwasser A, Atkinson-Dunn R. 2017. Bioinformatic analyses of whole-genome sequence data in a public health laboratory. Emerg Infect Dis 23:1441–1445. doi: 10.3201/eid2309.170416. - DOI - PMC - PubMed
    1. Firth C, Lipkin WI. 2013. The genomics of emerging pathogens. Annu Rev Genomics Hum Genet 14:281–300. doi: 10.1146/annurev-genom-091212-153446. - DOI - PubMed

Publication types

Grants and funding