Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 7;15(1):98.
doi: 10.1186/s13104-022-05978-5.

SnakeCube: containerized and automated pipeline for de novo genome assembly in HPC environments

Affiliations

SnakeCube: containerized and automated pipeline for de novo genome assembly in HPC environments

Nelina Angelova et al. BMC Res Notes. .

Abstract

Objective: The rapid progress in sequencing technology and related bioinformatics tools aims at disentangling diversity and conservation issues through genome analyses. The foremost challenges of the field involve coping with questions emerging from the swift development and application of new algorithms, as well as the establishment of standardized analysis approaches that promote transparency and transferability in research.

Results: Here, we present SnakeCube, an automated and containerized whole de novo genome assembly pipeline that runs within isolated, secured environments and scales for use in High Performance Computing (HPC) domains. SnakeCube was optimized for its performance and tested for its effectiveness with various inputs, highlighting its safe and robust universal use in the field.

Keywords: Assembly; Container; Genome; Pipeline; de-novo.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The workflow of SnakeCube and its sub-containers. Each box represents the different images available. A and B represent the quality checking steps for short or/and long reads. B and C serve users with only long reads. D combines them all, forming SnakeCube
Fig. 2
Fig. 2
The benchmarking and optimization of SnakeCube based on the L. sceleratus dataset. a Reports of average memory and load monitoring records of three serial runs, with each point representing a rule of the container. b The rules were further independently monitored for their time-scaling efficiency when run multiple times with an increasing thread allowance. The memory properties are reported as in megabytes and only the highest value at any point is recorded. Time is measured in seconds. Rules are presented in the down-right side with their order of appearance

References

    1. da Veiga LF, Grüning B, Alves Aflitos S, Röst H, Uszkoreit J, Barsnes H, et al. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. - DOI - PMC - PubMed
    1. Bhardwaj V, Heyne S, Sikora K, Rabbani L, Rauer M, Kilpert F, et al. snakePipes: facilitating flexible, scalable and integrative epigenomic analysis. Bioinformatics. 2019;35(22):4757–4759. doi: 10.1093/bioinformatics/btz436. - DOI - PMC - PubMed
    1. Danis T, Papadogiannis V, Tsakogiannis A, Kristoffersen J, Golani D, Tsaparis D, et al. Genome analysis of Lagocephalus sceleratus: unraveling the genomic landscape of a successful invader. Front Genet. 2021 doi: 10.3389/fgene.2021.790850. - DOI - PMC - PubMed
    1. Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. - DOI - PubMed
    1. Kurtzer GM, Sochat V, Bauer MW. Singularity: scientific containers for mobility of compute. PLoS ONE. 2017;12(5):e0177459. doi: 10.1371/journal.pone.0177459. - DOI - PMC - PubMed

LinkOut - more resources