Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 22;85(7):e00165-19.
doi: 10.1128/AEM.00165-19. Print 2019 Apr 1.

In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification

Affiliations

In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification

Yun Wu et al. Appl Environ Microbiol. .

Abstract

Bacteria of the genus Shigella, consisting of 4 species and >50 serotypes, cause shigellosis, a foodborne disease of significant morbidity, mortality, and economic loss worldwide. Classical Shigella identification based on selective media and serology is tedious, time-consuming, expensive, and not always accurate. A molecular diagnostic assay does not distinguish Shigella at the species level or from enteroinvasive Escherichia coli (EIEC). We inspected genomic sequences from 221 Shigella isolates and observed low concordance rates between conventional designation and molecular serotyping: 86.4% and 80.5% at the species and serotype levels, respectively. Serotype determinants for 6 additional serotypes were identified. Examination of differentiation gene markers commonly perceived as characteristic hallmarks in Shigella showed high variability among different serotypes. Using this information, we developed ShigaTyper, an automated workflow that utilizes limited computational resources to accurately and rapidly determine 59 Shigella serotypes using Illumina paired-end whole-genome sequencing (WGS) reads. Shigella serotype determinants and species-specific diagnostic markers were first identified through read alignment to an in-house curated reference sequence database. Relying on sequence hits that passed a threshold level of coverage and accuracy, serotype could be unambiguously predicted within 1 min for an average-size WGS sample of ∼500 MB. Validation with WGS data from 380 isolates showed an accuracy rate of 98.2%. This pipeline is the first step toward building a comprehensive WGS-based analysis pipeline of Shigella spp. in a field laboratory setting, where speed is essential and resources need to be more cost-effectively dedicated.IMPORTANCEShigella causes diarrheal disease with serious public health implications. However, conventional Shigella identification methods are laborious and time-consuming and can be erroneous due to the high similarity between Shigella and enteroinvasive Escherichia coli (EIEC) and cross-reactivity between serotyping antisera. Further, serotype interpretation is complicated for inexperienced users. To develop an easier method with higher accuracy based on whole-genome sequencing (WGS) for Shigella serotyping, we systematically examined genomic information of Shigella isolates from 53 serotypes to define rules for differentiation and serotyping. We created ShigaTyper, an automated pipeline that accurately and rapidly excludes non-Shigella isolates and identifies 59 Shigella serotypes using Illumina paired-end WGS reads. A serotype can be unambiguously predicted at a data processing speed of 538 MB/min with 98.2% accuracy from a regular laptop. Once it is installed, training in bioinformatics analysis and Shigella genetics is not required. This pipeline is particularly useful to general microbiologists in field laboratories.

Keywords: Shigella; in silico; serotying; whole-genome sequencing.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Summary of workflow for ShigaTyper. A detailed description can be found in Results (“Development of an automated in silico Shigella serotyping pipeline”).
FIG 2
FIG 2
Schematic illustration of a decision tree for Shigella differentiation before serotype prediction employed in ShigaTyper. ShigaTyper was designed to differentiate and exclude non-Shigella or contaminated isolates before predicting serotype for Shigella isolates. Distantly related non-Shigella/EIEC species (such as Listeria) usually have no read mapped to any of the genes in the reference sequence database and fail at checkpoint 1. Enterobacterial species (such as Salmonella) may have one or more hits but not ipaH_C and fail at checkpoint 2. Checkpoint 3 excludes EIEC based on the presence of full-length EclacY gene, with the exception of S. boydii 9 and 15. Last, if there are more than one wzx genes present in the WGS reads, it indicates multiple serotypes and fails checkpoint 4. Details on serotype prediction are provided in Results.
FIG 3
FIG 3
A representative output for ShigaTyper. (A) QC inspection of WGS reads. Quality inspection results were parsed from reports generated by fastp and are summarized in a table showing number of reads, number of bases, number of bases with >Q20 and >Q30 scores, and average read length. A visual representation of average quality score of each of the 4 bases over sequencing cycle and an estimated average depth for genome coverage are given below the table. (B) Serotype prediction for the sample. A direct serotype prediction is made by ShigaTyper based on threshold filter values passed by gene determinants as described in Results. A warning signal is given if sequence of the pINV-encoded virulence factor IpaB, a Shiga toxin, or an enterotoxin is detected in the WGS reads. The table summarizes characteristics of each of the genetic determinants identified from the WGS data. Those that passed the threshold filter values are shown in blue. All the codes are hidden from view for clarity of reporting but can be toggled to show for examination if needed. (C) Report of ShigaTyper batch processing. The summary table lists outcomes for serotype prediction, invasion plasmid, Shiga toxin, and enterotoxin.
FIG 4
FIG 4
Speed for serotype prediction is directly proportional to the size of WGS files. (A) Total time spent for Shigella serotyping was plotted against the sum of size of the paired-end WGS reads in fastq.gz format. Outcomes of serotype prediction are indicated on the right. A linear regression line is shown in black. (B) Total time, time spent on quality (QC) inspection, and time spent on mapping and prediction are plotted against the sum of size of the paired-end WGS reads in fastq.gz format. Linear regression lines of the same color are also shown. The average size for the sum of the paired-end WGS reads was 509.9 ± 538.1 MB and ranged from 30.7 to 3,436.7 MB.

Similar articles

Cited by

References

    1. Pires SM, Fischer-Walker CL, Lanata CF, Devleesschauwer B, Hall AJ, Kirk MD, Duarte AS, Black RE, Angulo FJ. 2015. Aetiology-specific estimates of the global and regional incidence and mortality of diarrhoeal diseases commonly transmitted through food. PLoS One 10:e0142927. doi:10.1371/journal.pone.0142927. - DOI - PMC - PubMed
    1. Liu J, Platts-Mills JA, Juma J, Kabir F, Nkeze J, Okoi C, Operario DJ, Uddin J, Ahmed S, Alonso PL, Antonio M, Becker SM, Blackwelder WC, Breiman RF, Faruque AS, Fields B, Gratz J, Haque R, Hossain A, Hossain MJ, Jarju S, Qamar F, Iqbal NT, Kwambana B, Mandomando I, McMurry TL, Ochieng C, Ochieng JB, Ochieng M, Onyango C, Panchalingam S, Kalam A, Aziz F, Qureshi S, Ramamurthy T, Roberts JH, Saha D, Sow SO, Stroup SE, Sur D, Tamboura B, Taniuchi M, Tennant SM, Toema D, Wu Y, Zaidi A, Nataro JP, Kotloff KL, Levine MM, Houpt ER. 2016. Use of quantitative molecular diagnostic methods to identify causes of diarrhoea in children: a reanalysis of the GEMS case-control study. Lancet 388:1291–1301. doi:10.1016/S0140-6736(16)31529-X. - DOI - PMC - PubMed
    1. GBD Diarrhoeal Diseases Collaborators. 2017. Estimates of global, regional, and national morbidity, mortality, and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Infect Dis 17:909–948. doi:10.1016/S1473-3099(17)30276-1. - DOI - PMC - PubMed
    1. DuPont HL, Levine MM, Hornick RB, Formal SB. 1989. Inoculum size in shigellosis and implications for expected mode of transmission. J Infect Dis 159:1126–1128. doi:10.1093/infdis/159.6.1126. - DOI - PubMed
    1. Kotloff KL, Riddle MS, Platts-Mills JA, Pavlinac P, Zaidi AKM. 2017. Shigellosis. Lancet 391:801–812. doi:10.1016/S0140-6736(17)33296-8. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources