Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 14;85(23):e01746-19.
doi: 10.1128/AEM.01746-19. Print 2019 Dec 1.

SeqSero2: Rapid and Improved Salmonella Serotype Determination Using Whole-Genome Sequencing Data

Affiliations

SeqSero2: Rapid and Improved Salmonella Serotype Determination Using Whole-Genome Sequencing Data

Shaokang Zhang et al. Appl Environ Microbiol. .

Abstract

SeqSero, launched in 2015, is a software tool for Salmonella serotype determination from whole-genome sequencing (WGS) data. Despite its routine use in public health and food safety laboratories in the United States and other countries, the original SeqSero pipeline is relatively slow (minutes per genome using sequencing reads), is not optimized for draft genome assemblies, and may assign multiple serotypes for a strain. Here, we present SeqSero2 (github.com/denglab/SeqSero2; denglab.info/SeqSero2), an algorithmic transformation and functional update of the original SeqSero. Major improvements include (i) additional sequence markers for identification of Salmonella species and subspecies and certain serotypes, (ii) a k-mer based algorithm for rapid serotype prediction from raw reads (seconds per genome) and improved serotype prediction from assemblies, and (iii) a targeted assembly approach for specific retrieval of serotype determinants from WGS for serotype prediction, new allele discovery, and prediction troubleshooting. Evaluated using 5,794 genomes representing 364 common U.S. serotypes, including 2,280 human isolates of 117 serotypes from the National Antimicrobial Resistance Monitoring System, SeqSero2 is up to 50 times faster than the original SeqSero while maintaining equivalent accuracy for raw reads and substantially improving accuracy for assemblies. SeqSero2 further suggested that 3% of the tested genomes contained reads from multiple serotypes, indicating a use for contamination detection. In addition to short reads, SeqSero2 demonstrated potential for accurate and rapid serotype prediction directly from long nanopore reads despite base call errors. Testing of 40 nanopore-sequenced genomes of 17 serotypes yielded a single H antigen misidentification.IMPORTANCE Serotyping is the basis of public health surveillance of Salmonella It remains a first-line subtyping method even as surveillance continues to be transformed by whole-genome sequencing. SeqSero allows the integration of Salmonella serotyping into a whole-genome-sequencing-based laboratory workflow while maintaining continuity with the classic serotyping scheme. SeqSero2, informed by extensive testing and application of SeqSero in the United States and other countries, incorporates important improvements and updates that further strengthen its application in routine and large-scale surveillance of Salmonella by whole-genome sequencing.

Keywords: Salmonella; WGS; serotype; whole-genome sequencing.

PubMed Disclaimer

Figures

FIG 1
FIG 1
The major components and workflows of SeqSero2. Genome assemblies (1) or raw sequencing reads (2) are inputs for the k-mer-based algorithms for serotype determinants. The microassembly workflow (3) is used for serotype prediction, new allele identification, and contamination detection.
FIG 2
FIG 2
Schematic overview of SeqSero2 algorithms. (a) The k-mer-based workflow for raw sequencing reads. (b) The microassembly workflow.
FIG 3
FIG 3
Speed comparison between SeqSero1 and SeqSero2. (a) Comparison of run times for predicting serotypes from raw sequencing reads. Average number of seconds for analyzing a genome is shown for each workflow. BWA-MEM was used for read mapping for both SeqSero1 and the microassembly workflow of SeqSero2. (b) Comparison of run times for predicting serotypes from genome assemblies. Average number of seconds for analyzing a genome is shown for each workflow. Run time was defined as the elapsed real time (wall time) for predicting the serotype of a genome using a single processor.
FIG 4
FIG 4
Phylogenetic analysis based on SNPs of WGS samples with potential and artificial contaminations. The tree is rooted by an S. enterica subsp. enterica serotype Stanleyville strain as outgroup. Reference serotype S. Stanley, S. Rissen, and S. Stanleyville genomes are shown by their NCBI accession numbers. * indicates that a WGS sample was annotated as serotype Stanley but potential contamination from a serotype Rissen genome was detected. ** indicates that a pseudosample (6.0 Mb) was created by mixing sequencing reads from a reference serotype Stanley genome (SRA accession no. SRR1582083) and a reference serotype Rissen genome (SRA accession no. SRR1753839) at a 1:1 ratio. Bar, 5,000 SNPs.
FIG 5
FIG 5
Phylogenetic relationship between a new fliC allele (in bold) and alleles of related antigenic types in the serotype determinant database. Original serotype and NCBI accession number for each allele are shown in parentheses. Bar, 25 SNPs.

References

    1. Ashton PM, Nair S, Peters TM, Bale JA, Powell DG, Painset A, Tewolde R, Schaefer U, Jenkins C, Dallman TJ, de Pinna EM, Grant KA, Salmonella Whole Genome Sequencing Implementation Group . 2016. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 4:e1752. doi:10.7717/peerj.1752. - DOI - PMC - PubMed
    1. Inns T, Ashton PM, Herrera-Leon S, Lighthill J, Foulkes S, Jombart T, Rehman Y, Fox A, Dallman T, DE Pinna E, Browning L, Coia JE, Edeghere O, Vivancos R. 2017. Prospective use of whole genome sequencing (WGS) detected a multi-country outbreak of Salmonella Enteritidis. Epidemiol Infect 145:289–298. doi:10.1017/S0950268816001941. - DOI - PMC - PubMed
    1. Agron PG, Walker RL, Kinde H, Sawyer SJ, Hayes DC, Wollard J, Andersen GL. 2001. Identification by subtractive hybridization of sequences specific for Salmonella enterica serovar Enteritidis. Appl Environ Microbiol 67:4984–4991. doi:10.1128/AEM.67.11.4984-4991.2001. - DOI - PMC - PubMed
    1. Deng X, den Bakker HC, Hendriksen RS. 2016. Genomic epidemiology: whole-genome-sequencing-powered surveillance and outbreak investigation of foodborne bacterial pathogens. Annu Rev Food Sci Technol 7:353–374. doi:10.1146/annurev-food-041715-033259. - DOI - PubMed
    1. Grimont PAD, Weill F-X. 2007. Antigenic formulae of the Salmonella serovars, 9th ed WHO Collaborating Centre for Reference and Research on Salmonella, Paris, France.

Publication types

LinkOut - more resources