Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 16;7(1):61.
doi: 10.1186/s40168-019-0665-y.

Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data

Affiliations

Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data

Krithika Arumugam et al. Microbiome. .

Abstract

Background: Short-read sequencing technologies have long been the work-horse of microbiome analysis. Continuing technological advances are making the application of long-read sequencing to metagenomic samples increasingly feasible.

Results: We demonstrate that whole bacterial chromosomes can be obtained from an enriched community, by application of MinION sequencing to a sample from an EBPR bioreactor, producing 6 Gb of sequence that assembles into multiple closed bacterial chromosomes. We provide a simple pipeline for processing such data, which includes a new approach to correcting erroneous frame-shifts.

Conclusions: Advances in long-read sequencing technology and corresponding algorithms will allow the routine extraction of whole chromosomes from environmental samples, providing a more detailed picture of individual members of a microbiome.

Keywords: Algorithms; Frame-shifts; Long-read sequencing; Microbial genomics; Microbiome; Sequence assembly; Software.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Summary of results. a Bandage [30] visualization of the Unicycler assembly graph before final segmentation into contigs. The largest connected components are labeled by the corresponding taxonomic bins, and the nodes are colored by the MEGAN taxonomic classification of the corresponding long reads. The seven longest linear and circular components correspond to the seven LR-chromosomes. b MEGAN-LR taxonomic binning: nodes are scaled to indicate the number of aligned bases in each bin. Bins that are more than 50% complete are shown in bold. c Annotation of the seven LR-chromosomes, labeled by the corresponding taxonomic bins. The three circular tracks indicate the genes annotated by Prokka on the forward strand (blue) and reverse strand (pink), and GC-skew (green and red indicate lower or higher than average GC content, respectively)
Fig. 2
Fig. 2
Analysis. a Long-read analysis pipeline shown from left to right. MinION sequencing produces a set of reads. These are assembled into contigs using Unicycler and aligned against the NCBI-nr database using DIAMOND. The contigs and alignments are processed by MEGAN so as to perform taxonomic binning and also to produce frame-shift-corrected contigs. These are analyzed using CheckM and annotated using Prokka. The duration of each step is shown in wall-clock hours. MEGAN analysis took less than 10 min. b Frame-shift correction: in frame-shift alignments, forward slashes, and backward slashes indicate a frame decrease, or increase, by one, respectively. Correction is performed by inserting one or two unspecified nucleotides into the sequence, respectively
Fig. 3
Fig. 3
Dot plots for the three LR-chromosomes that have high similarity to reference genome assemblies, namely, B1 against GCA_001567185.1 (Bacteroidetes bacterium OLB12), B2 against GCA_000584975.2 (Candidatus accumulibacter sp. SK-02), and B5 against GCA_001567405.1 (Bacteroidetes bacterium OLB8). Forward alignments are shown in red, whereas reverse complemented alignments are shown in blue, and gray lines indicate contig boundaries in the reference assemblies. The number of contigs in each reference sequence is given in brackets
Fig. 4
Fig. 4
Distribution of repeat rates in all complete bacterial genomes in RefSeq. Vertical lines show the repeat rate of the seven LR chromosomes. Additional 17 data points that have a repeat percentage above 25% are not shown
Fig. 5
Fig. 5
For each of the seven LR chromosomes (B1–B7), we show a dot plot comparison against the set of SR contigs that align, reporting their number in brackets
Fig. 6
Fig. 6
Overview of the concordance score for LR contigs, on the one hand, and SR-bins and reference genomes, on the other. The x axis shows the length of each LR contig, with the position of each tagged with a tick on the axis; the y axis shows the value of concordance score κ, and data points represent pairs of LR chromosomes and SR bins, or references genomes. Selected pairs with high concordance score are labeled with “ 〈LR-chromosome.id〉−〈SR-bin.id〉” for comparisons to SR-bins or “ 〈LR-chromosome.id〉−〈GCA_id〉” for comparisons to references. Within each set of the seven LR chromosome alignments, the pair with the maximum concordance score is shown in red. All LR chromosomes have highly concordant counterpart SR-bins, with the exception of LR chromosome 5. Further details on individual LR chromosomes are reported in Additional file 10: Figure S2

Similar articles

Cited by

References

    1. The Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–14. doi: 10.1038/nature11234. - DOI - PMC - PubMed
    1. Willmann M, El-Hadidi M, Huson DH, Schütz M, Weidenmaier C, Autenrieth IB, Peter S. Antibiotic selection pressure determination through sequence-based metagenomics. Antimicrob Agents Chemother. 2015;59(12):7335–45. doi: 10.1128/AAC.01504-15. - DOI - PMC - PubMed
    1. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R. MEGAN Community Edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol. 2016;12(6):1004957. doi: 10.1371/journal.pcbi.1004957. - DOI - PMC - PubMed
    1. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35:833. doi: 10.1038/nbt.3935. - DOI - PubMed
    1. Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239. doi: 10.1186/s13059-016-1103-0. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources