Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan;34(1):64-9.
doi: 10.1038/nbt.3416. Epub 2015 Dec 14.

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome

Affiliations

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome

Volodymyr Kuleshov et al. Nat Biotechnol. 2016 Jan.

Abstract

Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

PubMed Disclaimer

Conflict of interest statement

Competing interests

V.K. serves as a consultant for Illumina Inc. SB is a co-founder of DNAnexus and a member of the scientific advisory boards of 23andMe and Eve Biomedical. MS is a co-founder of Personalis and a member of the scientific advisory boards of Personalis, AxioMx and Genapsys.

Figures

Figure 1
Figure 1
The Nanoscope pipeline and the Lens algorithm. Left: Nanoscope first assembles short and long reads using the Soapdenovo2 and Celera assemblers and merges the results with Minimus2; it then assigns taxonomic labels to contigs with the Fragment Classification Package (FCP) and identifies bacterial strains with Lens; finally, it estimates abundances of detected bacterial species by mapping short reads to contigs and by aggregating the coverage over all contigs assigned to the same species. Right: The Lens algorithm identifies heterozygous variants in the assembled genomic contigs (a); these variants are supported by long reads (b) aligned to the contigs. Each long read originates from a single organism; thus the variants it supports must belong to the same substrain. By connecting reads at their overlapping variants, Lens places the variants into multi-kilobase-long haplotypes (c) associated with bacterial strains. The number of haplotypes is a priori unknown and is inferred from the data.
Figure 2
Figure 2
Long reads aligned to assembled metagenomic contigs reveal extensive variation among bacterial strains. Top: Fragment of a 110 kbp long region within a metagenomic contig belonging to the species Odoribacter splanchnicus; the region harbors numerous strain variants that can be assembled into bacterial haplotypes. Bottom left: Fragment of a bacterial region containing 32 genomic variants that assemble into four bacterial haplotypes. Bottom right: These haplotypes can be placed in an evolutionary tree satisfying perfect phylogeny; for simplicity, we visualize this tree over 4 of the 32 positions in the region (upper left corner).
Figure 3
Figure 3
Bacterial strains identified only by long reads (blue), only by short reads (magenta), by both technologies (green), and only by a combination of the two (black), ordered by abundance. Long reads identify 51 species that short reads do not detect; combining short and long reads identifies 58 additional species, including ones having the lowest abundance. A total of 178 species are detected using all the methods.

References

    1. Rinke C, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013 doi: 10.1038/nature12352. - DOI - PubMed
    1. Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation. 2012;2:3. - PMC - PubMed
    1. Daniel R. The metagenomics of soil. Nat Rev Micro. 2005;3:470–478. - PubMed
    1. Venter JC, et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science. 2004;304:66–74. - PubMed
    1. Human Microbiome Project Consortium Structure function and diversity of the healthy human microbiome. Nature. 2012;486:207–214. - PMC - PubMed

Publication types