Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov;7(11):000654.
doi: 10.1099/mgen.0.000654.

Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer

Affiliations

Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer

Grégoire Siekaniec et al. Microb Genom. 2021 Nov.

Abstract

This study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called ORI, on 77 strains of Streptococcus thermophilus. We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed 'sibling' strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.

Keywords: MinION; Streptococcus thermophilus; bacterial strain identification; bloom filters; long read; strain classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Biclusters in a strain × gene matrix and associated labelling of nodes in a classification tree.
Fig. 2.
Fig. 2.
Overview of the ORI method in three steps: (1) genome indexing, (2) query the index from filtered reads, and (3) identification of strains.
Fig. 3.
Fig. 3.
Heatmap of the Jaccard distance for 28 S. thermophilus strains + S. macedonicus ACA-DC 198 + L. delbrueckii subsp. bulgaricus ATCC 11842.
Fig. 4.
Fig. 4.
Identification results on a balanced mix of S. thermophilus strains. The Hamming distance between observed and expected strains, on the y-axis, has been multiplied by 10 000 (in blue for ORI, orange for StrainSeeker and green for Kraken 2). Stars represent mean values. Matthews correlation coefficient (MCC) values are given on the first line just above the x-axis at the bottom of the diagrams, followed by the ambiguity ratio (number of strains identified/number of strains present).
Fig. 5.
Fig. 5.
Identification of subdominant strains in a mixture of S. thermophilus strains using various numbers of reads. The Hamming distance between observed and expected strains, on the y-axis, has been multiplied by 10 000 (in blue for ORI, orange for StrainSeeker and green for Kraken 2). Matthews correlation coefficient (MCC) values are given on the first line just above the x-axis at the bottom of the diagrams, followed by the ambiguity ratio (number of strains identified/number of strains present).

Similar articles

Cited by

References

    1. Stromberg ZR, Van Goor A, Redweik GAJ, Wymore Brand MJ, Wannemuehler MJ, et al. Pathogenic and non-pathogenic Escherichia coli colonization and host inflammatory response in a defined microbiota mouse model. Dis Model Mech. 2018;11:11. - PMC - PubMed
    1. Siezen RJ, Starrenburg MJC, Boekhorst J, Renckens B, Molenaar D, et al. Genome-scale genotype-phenotype matching of two Lactococcus lactis isolates from plants identifies mechanisms of adaptation to the plant niche. Appl Environ Microbiol. 2008;74:424–436. doi: 10.1128/AEM.01850-07. - DOI - PMC - PubMed
    1. Zhang J, Liu M, Xu J, Qi Y, Zhao N, et al. First insight into the probiotic properties of ten Streptococcus thermophilus strains based on in vitro conditions. Curr Microbiol. 2020;77:343–352. doi: 10.1007/s00284-019-01840-3. - DOI - PubMed
    1. Meola M, Rifa E, Shani N, Delbès C, Berthoud H, et al. DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products. BMC Genomics. 2019;20:560. doi: 10.1186/s12864-019-5914-8. - DOI - PMC - PubMed
    1. Lesker TR, Durairaj AC, Gálvez EJC, Lagkouvardos I, Baines JF, et al. An integrated metagenome catalog reveals new insights into the murine gut microbiome. Cell Rep. 2020;30:2909–2922. doi: 10.1016/j.celrep.2020.02.036. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources