Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 23;14(2):217.
doi: 10.3390/v14020217.

ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection

Affiliations

ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection

Filipe Zimmer Dezordi et al. Viruses. .

Abstract

The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.

Keywords: SARS-CoV-2; genomic variants; genomics; genotyping; software; virus bioinformatics; viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The workflow scheme. (A) The six steps of the workflow. (B) The workflow can be configured to work on diverse computational environments. (C) Some of the most important per sample outputs generated by the workflow.
Figure 2
Figure 2
The ViralFlow threads scalability benchmark for (A) Case I and (B) Case II. CPPr = Cpus per sample requested.
Figure 3
Figure 3
The iSNV frequency sites of artificial datasets simulating co-infection events (ART1 to ART5). The black dashed line represents the expected minor iSNV average frequencies in each artificial dataset. (A). The iSNV frequencies of four artificial datasets. (B). The lineage-defining mutations of P.1 and B.1.1.28 lineages (upper section) and the allele frequencies of minor and major consensus genomes (lower section). The grey boxes in section (B) depict the boundaries of adjacent SARS-CoV-2 proteins bearing lineage-defining mutations.
Figure 4
Figure 4
The lineages reported by PANGO version 3.1.11 implemented inside ViralFlow 0.0.6. (A). The lineages from the 86 samples used to test viral flow. (B). The lineages from 1516 genomes available at the GISAID database (accessed on 30 August 2021) except for the 86 samples used to test ViralFlow. (C). The compilation of all genomes available from GISAID (1516) including all 86 samples used to test ViralFlow.

References

    1. Wu F., Zhao S., Yu B., Chen Y.-M., Wang W., Song Z.-G., Hu Y., Tao Z.-W., Tian J.-H., Pei Y.-Y., et al. A New Coronavirus Associated with Human Respiratory Disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
    1. WHO Director-General’s Opening Remarks at the Media Briefing on COVID-19—11 March 2020. [(accessed on 27 September 2021)]. Available online: https://www.who.int/director-general/speeches/detail/who-director-genera....
    1. Shu Y., McCauley J. GISAID: Global Initiative on Sharing All Influenza Data—From Vision to Reality. Eurosurveillance. 2017;22:30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. - DOI - PMC - PubMed
    1. O’ Toole A., Scher E., Underwood A., Jackson B., Hill V., McCRone J.T., Colquhoun R., Ruis C., Abu-Dahab K., Taylor B. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 2021;7:veab064. doi: 10.1093/ve/veab064. - DOI - PMC - PubMed
    1. Da Silva S.J.R., Silva C.T.A.d., Guarines K.M., Mendes R.P.G., Pardee K., Kohl A., Pena L. Clinical and Laboratory Diagnosis of SARS-CoV-2, the Virus Causing COVID-19. ACS Infect. Dis. 2020;6:2319–2336. doi: 10.1021/acsinfecdis.0c00274. - DOI - PubMed

Publication types

Substances

Supplementary concepts

LinkOut - more resources