Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 5;9(1):130.
doi: 10.1186/s40168-021-01072-3.

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Affiliations

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Benjamin J Callahan et al. Microbiome. .

Abstract

Background: Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge.

Methods: Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads.

Results: LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens.

Conclusions: The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics. Video abstract.

Keywords: Amplicon sequencing; Long-read sequencing; Metagenomics; Synthetic long reads.

PubMed Disclaimer

Conflict of interest statement

Michael Balamotis and Tuval Ben Yehezkel are employees of Loop Genomics, the vendor for the synthetic long-read sequencing technology analyzed in this manuscript.

Figures

Fig. 1
Fig. 1
The rates of LoopSeq point errors by position in read, type (substitution/insertion/deletion), and quality score. The scale of the y-axis is different for substitutions because they occur at a much higher rate than insertions or deletions
Fig. 2
Fig. 2
Error types and rates in LoopSeq data. (a) Schematic description of three error types: point errors, chimeras, introgressions. (b) The fraction of each error type as a function of expected error quality filter threshold. The red line shows the fraction of LoopSeq reads removed as a function of the filter threshold
Fig. 3
Fig. 3
The fraction of error-free amplicon sequencing reads of different lengths using commercially available long-read sequencing technologies. Points represent observations from measurements of defined communities (the Zymo mock community) or single-strain isolates (fungal and bacterial isolates) that were reported in this manuscript for LoopSeq, and that were reported in [12] for PacBio CCS reads. The 16S rRNA and fungal 18S-ITS data are from traditional amplicon sequencing of a single genetic region (the “Methods” section). The LoopSeq genomic amplicon data is based on LoopSeq sequencing of randomly amplified regions of the genome of several bacteria (the “Methods” section). Lines represent the expected error-free fraction based on measurements of the error rates in 16S rRNA gene amplicon sequencing data, and assuming a constant per-base error rate. The Oxford Nanopore line is based on the per-base error-rate of 6% reported by the manufacturer for the R10 chemistry
Fig. 4
Fig. 4
PCoA ordination of the total community compositions of three human fecal samples (R3.1, R9.3, R9.4) as measured by LoopSeq and PacBio full-length 16S rRNA gene sequencing. The community compositions measured by each technology were highly similar, leading to the data points on this ordination being highly overlapping for each sample. PacBio CCS measurements were made using two different sequencing chemistries as indicated in the legend parentheticals
Fig. 5
Fig. 5
The rate of substitution differences between DADA2-denoised ASVs and the next closest ASV that are in conserved vs. variable regions of the 16S gene, in LoopSeq data from three human fecal samples. If substitution patterns were random (as would be expected if they were caused by sequencing errors) then points should fall along the gray line of equal rates
Fig. 6
Fig. 6
Foodborne pathogen species detected in retail meat rinse samples. (a) The species and relative abundance of six common foodborne pathogen species were determined from full-length LoopSeq 16S sequencing of six retail meat samples. Campylobacter and Listeria monocytogenes did not appear in these samples. (b) Nine distinct alleles of the 16S rRNA gene were detected in the C. perfringens strain present in sample GT1. The first allele was present at roughly twice the abundance of the others, consistent with it having two copies in the genome while the rest have only one copy. C. perfringens has 10 copies of the 16S rRNA gene in its genome. Only the V2 region is shown for visual simplicity

References

    1. Ackelsberg J, Rakeman J, Hughes S, Petersen J, Mead P, Schriefer M, Kingry L, Hoffmaster A, Gee JE. Lack of evidence for plague or anthrax on the New York City subway. Cell Syst. 2015;1(1):4–5. doi: 10.1016/j.cels.2015.07.008. - DOI - PubMed
    1. Afshinnekoo E, Meydan C, Chowdhury S, Jaroudi D, Boyer C, Bernstein N, Maritz JM, Reeves D, Gandara J, Chhangawala S, Ahsanuddin S. Modern methods for delineating metagenomic complexity. Cell Syst. 2015;1(1):6–7. doi: 10.1016/j.cels.2015.07.007. - DOI - PubMed
    1. Afshinnekoo E, Meydan C, Chowdhury S, Jaroudi D, Boyer C, Bernstein N, Maritz JM, Reeves D, Gandara J, Chhangawala S, Ahsanuddin S. Geospatial resolution of human and bacterial diversity with city-scale metagenomics. Cell Syst. 2015;1(1):72–87. doi: 10.1016/j.cels.2015.01.001. - DOI - PMC - PubMed
    1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–477. doi: 10.1089/cmb.2012.0021. - DOI - PMC - PubMed
    1. Beiki H, Liu H, Huang J, Manchanda N, Nonneman D, Smith TP, Reecy JM, Tuggle CK. Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data. BMC Genomics. 2019;20(1):344. doi: 10.1186/s12864-019-5709-y. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources