Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 9;7(1):e01948-15.
doi: 10.1128/mBio.01948-15.

Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

Affiliations

Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

Yu-Chih Tsai et al. mBio. .

Abstract

Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant "genomes" are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation.

Importance: The species comprising a microbial community are often difficult to deconvolute due to technical limitations inherent to most short-read sequencing technologies. Here, we leverage new advances in sequencing technology, single-molecule sequencing, to significantly improve reconstruction of a complex human skin microbial community. With this long-read technology, we were able to reconstruct and annotate a closed, high-quality genome of a previously uncharacterized skin species. We demonstrate that hybrid approaches with short-read technology are sufficiently powerful to reconstruct even single-nucleotide polymorphism level variation of species in this a community.

PubMed Disclaimer

Figures

FIG 1
FIG 1
SMRT reads accurately reconstruct species abundances in metagenomic communities and recover rare species. (A) Estimation of sequencing coverage of the community. The number of reads subsampled for k-mer counting is shown as Reads sampled on the x axis. Reads are split into 20-mers, compared to a k-mer coverage table, and kept only if the median k-mer coverage is below 20× (percent reads kept shown on the y axis). If k-mer coverage is sufficiently deep for the community, one observes a decrease and leveling off in percent reads kept as the number of reads sampled increases. Whether the reads were generated with HiSeq, SMRT sequencing, or SMRT reads error corrected using HiSeq reads is indicated for each panel. (B) Relative abundance plots of the most abundant taxa per kingdom. “16S rRNA” classifications are to the genus level. “Mapped to reference” indicates relative abundances mapping to a multikingdom reference database containing Archaea, Bacteria, fungi, and viruses. “Normalized with unmapped” contextualizes the relative abundance of species generated by reference-based mapping to the fraction of reads from the sample that does not map to any reference. (C) Concordance of HiSeq and SMRT species classifications with Spearman correlation (ρ) calculated with the corresponding P value. (D) Differential detection of species with the two sequencing methods shown by kingdom. Venn diagrams show the shared number of species detected for the arm and foot samples. The colors indicate different taxonomic units for the Archaea, Bacteria, fungi, and viruses as follows. For Archaea, red colors indicate Crenarchaeota and green colors indicate Euryarchaeota. For Bacteria, red colors indicate Acidobacteria, Spirochaetes, Tenericutes, Thermotogae, and Verrucomicrobia; greens indicate Actinobacteria; blues indicate Bacteroidetes, Chlamydiae, Chloroflexi, and Cyanobacteria; oranges indicate Deinococcus-Thermus; grays indicate Firmicutes; yellows indicate Fusobacteria and Plantomycetes; purples indicate Proteobacteria. For fungi, reds, yellows, and purples indicate miscellaneous; greens indicate Apicomplexa; blues indicate Ascomycota; oranges indicate Basidiomycota; grays indicate Chlorophyta. For viruses, red and blue colors indicate miscellaneous and Fuselloviridae; greens indicate Herpesviridae; oranges indicate Myoviridae; grays indicate Papillomaviridae, Phycodnaviridae, Podoviridae, Polydnaviridae, and Polyomaviridae; yellows indicate Poxviridae; purples indicate Siphoviridae.
FIG 2
FIG 2
Reconstruction of a closed C. simulans metagenome from HGAP assemblies. (A) (Top) Mean coverage from remapping SMRT reads to the four longest contigs that share a lowest common ancestor of C. aurimucosum. (Bottom) These four contigs were manually linked to form a single chromosome using contig overlap information. The predicted origin of replication is indicated. (B) Taxonomic assignment of the reconstructed genome. 16S rRNA gene sequences were predicted from the chromosome and placed on a phylogenetic tree of full-length Corynebacterium rRNA gene sequences. A portion of the tree is shown with bootstrap values (1,000 iterations), showing placement of the reconstructed genome (“Pacbio, metagenome”) with C. simulans (“type”). For comparisons, we also included a previously sequenced 454 skin isolate typed as C. simulans (“Skin isolate”). (C) Synteny plots compare similarity of the de novo C. simulans metagenome, the sequenced C. simulans type strain, and the 454-sequenced skin isolate, generated by NUCmer. The top panels show the percent similarity across the genomes, ordered by nucleotide position. In the dot plots (bottom panels), aligned segments up to 3 kb in length are represented as dots or lines and the orientation of contigs is shown (forward [red] and reverse [blue]). Scale bar represents an estimate of relative times of divergence between nodes.
FIG 3
FIG 3
Metagenomic assembly comparisons between long-read, short-read, and hybrid approaches. (A and B) Line plots show the cumulative length of contigs generated by each of the assembly methods for the foot and arm samples, respectively. The assembly methods are indicated by the colors shown in the legend in panel F. (C and D) Violin plots are boxplots whose shapes show the density distribution of contig lengths (log10) for each of the assembly methods. (E and F) Modified Nx plots show the length for which the contigs of that length or longer covers x percentage of the assembly. For the foot, plots are separated by what aligned to the C. simulans metagenome as a reference (inset) or for contigs that did not align (unaligned). For the arm, all contigs are shown.
FIG 4
FIG 4
Reannotation of skin communities with metagenomes. (A) Reconstruction fidelity of the C. simulans genome for each of the assembly methods. The number of contigs, number of misassemblies (by Plantagora’s definition), and percentage of the genome covered are shown. A correct alignment of the contig is indicated in green, with each vertical bar representing a contig (the contigs or bars are staggered to distinguish contigs). A misassembly in the contig is indicated in red. (B) Improvement in the fraction of sequences that can be assigned to a taxonomical unit using reference-free methods. The community composition using the original reference genome database (Original) and with the addition of C. simulans genome to the database (+C.simulans) are indicated. Hybrid, SPAdes short-read, and HGAP assemblies were annotated using the NCBI nr database. Colors are as shown in the keys in Fig. 1B with additional colors indicated.
FIG 5
FIG 5
Population-wide heterozygosity of C. simulans strains in the metagenome. Low-frequency variant calls mapped to the C. simulans de novo metagenome. The outer ring is colored by TIGR (the Institute for Genomic Research; now the J. Craig Venter Institute) roles for the protein-coding genes in shades of green and blue. Mobile elements (e.g., transposases, integrases) are yellow. Genes with hypothetical functions are gray. RNA genes are indicated on the next ring, with rRNA genes in green and tRNA genes in magenta. The innermost ring shows a histogram of variant calls per 1,000-nucleotide window. The scale for each gray circle is 10 variants. Genes and gene clusters with one or more variants are annotated along the outer edge.
FIG 6
FIG 6
Select functional characterizations of C. simulans. (A) Whole-genome comparisons of the C. simulans genomes described in this study. The de novo C. simulans metagenome is used as a reference (innermost ring in black). The GC content of the metagenome is shown in the second ring in black. Ordered contigs from the 454-sequenced skin isolate are shown in the third ring. Alignment of the C. simulans type strain is shown in the fourth or outermost ring. Intensity of color shows the percent identity of the match. (B) COG categories of variable genes, those that are absent in either one of the two complete genomes. (C) Genome structure of the bacteriophage identified in panel A. The metagenome-derived C. simulans contains two CRISPR spacers that are a 100% match to this phage genome and are indicated in purple. (D) Epigenome analysis of C. simulans. Examples of kinetic modification detection signals of 6-methyladenine (m6A) in the C. simulans metagenome (top) and type strain (bottom). The x axis shows the template position and base calls, and the y axis shows the ratio of average interpulse durations (IPDs) for each DNA strand to the control. High deviations from the baseline level indicate a base modification, with the forward strand shown in purple and the reverse strand shown in orange. The methylated bases are indicated in bold type, and the underlined bases indicate methylation on the opposite DNA strand. (E) Examples of biosynthetic pathways predicted from the two complete genomes using antiSMASH 3.0.

References

    1. Oh J, Byrd AL, Deming C, Conlan S, NISC Comparative Sequencing Program, Kong HH, Segre JA. 2014. Biogeography and individuality shape function in the human skin metagenome. Nature 514:59–64. doi: 10.1038/nature13786. - DOI - PMC - PubMed
    1. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y, Shen D, Peng Y, Zhang D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, Lu D, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li X, Chen W, Xu R, Wang M, Feng Q, Gong M, Yu J, Zhang Y, Zhang M, Hansen T, Sanchez G, Raes J, Falony G, Okuda S, Almeida M, LeChatelier E, Renault P, Pons N, Batto J-M, Zhang Z, Chen H, Yang R, Zheng W, Li S, Yang H, Wang J, Ehrlich SD, Nielsen R, Pedersen O, Kristiansen K, Wang J. 2012. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490:55–60. doi: 10.1038/nature11450. - DOI - PubMed
    1. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. 2013. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31:533–538. doi: 10.1038/nbt.2579. - DOI - PubMed
    1. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi: 10.1038/nmeth.3103. - DOI - PubMed
    1. Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, Radune D, Bergman NH, Phillippy AM. 2013. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol 14:R101. doi: 10.1186/gb-2013-14-9-r101. - DOI - PMC - PubMed

Publication types

MeSH terms