Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 3;15(3):evad020.
doi: 10.1093/gbe/evad020.

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies

Affiliations

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies

Jennifer Chang et al. Genome Biol Evol. .

Abstract

Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.

Keywords: Nextflow; QV; assembly; genome; polish; polishCLR.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Diagram of polishCLR workflow for three input cases. Polishing Steps 1 and 2 are run separately to accommodate an optional scaffolding step. Labeled arrows indicate processes while solid boxes indicate products. The dotted arrow indicates that the manual scaffolding step is optional and not within the scope of this pipeline.

References

    1. Amstutz P, et al. . 2016. Common workflow language, v1. 0. Available from: 10.6084/m9.figshare.3115156.v2. - DOI
    1. Bushnell B. 2014. BBTools software package. Available from:http://sourceforge.net/projects/bbmap.
    1. Childers AK, et al. . 2021. The USDA-ARS Ag100Pest initiative: high-quality genome assemblies for agricultural pest arthropod research. Insects 12(7):626, 1–14. 10.3390/insects12070626. - DOI - PMC - PubMed
    1. Chin CS, et al. . 2016. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 13(12):1050–1054. - PMC - PubMed
    1. Di Tommaso P, et al. . 2017. Nextflow enables reproducible computational workflows. Nat Biotechnol. 35(4):316–319. - PubMed

Publication types