Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 4;19(1):153.
doi: 10.1186/s13059-018-1540-z.

SKESA: strategic k-mer extension for scrupulous assemblies

Affiliations

SKESA: strategic k-mer extension for scrupulous assemblies

Alexandre Souvorov et al. Genome Biol. .

Abstract

SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases .

Keywords: Contamination; De-novo assembly; DeBruijn graphs; Illumina reads; Sequence quality.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Substrings mismatches: mismatches per 100 Kb seen in assemblies of SPAdes and MegaHit for inputs in substrings set. SKESA has no mismatches at any length in this set
Fig. 2
Fig. 2
Substrings contiguity: N50 for assemblies generated by SKESA, SPAdes, and MegaHit for inputs in substrings set
Fig. 3
Fig. 3
Substrings deviation: deviation for assemblies generated by SKESA, SPAdes, and MegaHit for inputs in substrings set. We do not show values for input length 22 where MegaHit has value of almost 100 and input length 34 and 56 for which SPAdes did not produce an assembly
Fig. 4
Fig. 4
SKESA flowchart: flowchart describing main steps in the algorithm used by SKESA for assembly
Fig. 5
Fig. 5
Main distribution in SRR2821438: histogram for frequency of 21-mers seen in SRR2821438 with counts on X axis up to 400 and number of 21-mers with that count on Y axis
Fig. 6
Fig. 6
Small distributions in SRR2821438: histogram for frequency of 21-mers seen in SRR2821438 with counts on X axis between 325 and 2000 and number of 21-mers with that count on Y axis

References

    1. Lugli Gabriele Andrea, Milani Christian, Mancabelli Leonardo, van Sinderen Douwe, Ventura Marco. MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation. FEMS Microbiology Letters. 2016;363(7):fnw049. doi: 10.1093/femsle/fnw049. - DOI - PubMed
    1. Pina-Martins F, Vieira BM, Seabra SG, Batista D, Paulo OS. 4pipe4–a 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information. BMC Bioinformatics. 2016; 17:41. 10.1186/s12859-016-0892-1. - PMC - PubMed
    1. Lai B, Wang F, Wang X, Duan L, Zhu H. Intemap: integrated metagenomic assembly pipeline for NGS short reads. BMC Bioinformatics. 2015; 16:244. 10.1186/s12859-015-0686-x. - PMC - PubMed
    1. Wolfinger Michael T., Fallmann Jörg, Eggenhofer Florian, Amman Fabian. ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines. F1000Research. 2015;4:50. doi: 10.12688/f1000research.6157.1. - DOI - PMC - PubMed
    1. Tritt Andrew, Eisen Jonathan A., Facciotti Marc T., Darling Aaron E. An Integrated Pipeline for de Novo Assembly of Microbial Genomes. PLoS ONE. 2012;7(9):e42304. doi: 10.1371/journal.pone.0042304. - DOI - PMC - PubMed

Publication types