Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 28:12:711437.
doi: 10.3389/fgene.2021.711437. eCollection 2021.

poreCov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing

Affiliations

poreCov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing

Christian Brandt et al. Front Genet. .

Erratum in

Abstract

In response to the SARS-CoV-2 pandemic, a highly increased sequencing effort has been established worldwide to track and trace ongoing viral evolution. Technologies, such as nanopore sequencing via the ARTIC protocol are used to reliably generate genomes from raw sequencing data as a crucial base for molecular surveillance. However, for many labs that perform SARS-CoV-2 sequencing, bioinformatics is still a major bottleneck, especially if hundreds of samples need to be processed in a recurring fashion. Pipelines developed for short-read data cannot be applied to nanopore data. Therefore, specific long-read tools and parameter settings need to be orchestrated to enable accurate genotyping and robust reference-based genome reconstruction of SARS-CoV-2 genomes from nanopore data. Here we present poreCov, a highly parallel workflow written in Nextflow, using containers to wrap all the tools necessary for a routine SARS-CoV-2 sequencing lab into one program. The ease of installation, combined with concise summary reports that clearly highlight all relevant information, enables rapid and reliable analysis of hundreds of SARS-CoV-2 raw sequence data sets or genomes. poreCov is freely available on GitHub under the GNUv3 license: github.com/replikation/poreCov.

Keywords: Nextflow; SARS-CoV-2; bioinformatics; coronavirus – COVID-19; docker; lineages; nanopore sequencing; pipeline.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Simplified overview of the poreCov workflow. Results and inputs are colored in yellow. Workflow processes are colored blue, and the resulting information of a process is red.
FIGURE 2
FIGURE 2
Final report generated by poreCov after each run. Four samples and one negative control were analyzed using 1,200 bp amplicon primers. The first part of the report, the run information, summarizes the overall performance of the sequencing run in a short form across all samples. The second section, the sample results, summarizes all relevant information for each sample in tabular form. The coverage plots are presented in batches of six samples, and additional charts are automatically added for larger runs. The negative control contained 98 reads classified as SARS-CoV-2 which might be attributed to lab contamination or remaining SARS-CoV-2 primers that were sequenced.
FIGURE 3
FIGURE 3
Example of how sample-barcode bleeding can be identified by inspecting SNP proportions of false-positive negative controls in mapping (.bam) files. Three samples and two negative controls are shown. Each column shows the proportions of the bases (A-green, T-red, G-yellow, C-blue) that were found in all mapped reads for the respective position. A position where all the bases of the reads are the same is shown as a gray bar. The read coverage is noted on the right side of the figure. Library preparation for samples (A–C) was performed with too much DNA. Negative controls follow the same library preparation steps as the samples (A–C) but with pure water instead of viral amplicon DNA. Additionally, the library preparation for the negative controls was performed in a separate lab and with new reagents up until the barcode pooling. Negative controls indicate SNP proportions that roughly reflect sample (A–C) considering their average coverage. The schematic figure is adapted from IGV.

References

    1. Brandt C., Spott R., Hölzer M., Kühnert D., Fuchs S., Lohde M., et al. (2021). Molecular epidemiology of SARS-CoV-2 - a regional to global perspective. medRxiv [Preprint]. 10.1101/2021.01.25.21250447 - DOI
    1. Danecek P., Bonfield J. K., Liddle J., Marshall J., Ohan V., Pollard M. O., et al. (2021). Twelve years of SAMtools and BCFtools. Gigascience 10:giab008. 10.1093/gigascience/giab008 - DOI - PMC - PubMed
    1. De Coster W., D’Hert S., Schultz D. T., Cruts M., Van Broeckhoven C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34 2666–2669. 10.1093/bioinformatics/bty149 - DOI - PMC - PubMed
    1. Di Tommaso P., Chatzou M., Floden E. W., Barja P. P., Palumbo E., Notredame C. (2017). Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35 316–319. 10.1038/nbt.3820 - DOI - PubMed
    1. Durner J., Burggraf S., Czibere L., Tehrani A., Watts D. C., Becker M. (2021). Fast and cost-effective screening for SARS-CoV-2 variants in a routine diagnostic setting. Dent. Mater. Off. Publ. Acad. Dent. Mater. 37 e95–e97. 10.1016/j.dental.2021.01.015 - DOI - PMC - PubMed