Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jan;17(2):218-230.
doi: 10.1016/j.cgh.2018.09.017. Epub 2018 Sep 18.

Microbiome 101: Studying, Analyzing, and Interpreting Gut Microbiome Data for Clinicians

Affiliations
Review

Microbiome 101: Studying, Analyzing, and Interpreting Gut Microbiome Data for Clinicians

Celeste Allaband et al. Clin Gastroenterol Hepatol. 2019 Jan.

Abstract

Advances in technical capabilities for reading complex human microbiomes are leading to an explosion of microbiome research, leading in turn to intense interest among clinicians in applying these techniques to their patients. In this review, we discuss the content of the human microbiome, including intersubject and intrasubject variability, considerations of study design including important confounding factors, and different methods in the laboratory and on the computer to read the microbiome and its resulting gene products and metabolites. We highlight several common pitfalls for clinicians, including the expectation that an individual's microbiome will be stable, that diet can induce rapid changes that are large compared with the differences among subjects, that everyone has essentially the same core stool microbiome, and that different laboratory and computational methods will yield essentially the same results. We also highlight the current limitations and future promise of these techniques, with the expectation that an understanding of these considerations will help accelerate the path toward routine clinical application of these techniques developed in research settings.

Keywords: Clinician; Diagnosis; Gut Microbiome; Prognosis; Study Design.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest The authors disclose no conflicts.

Figures

Figure 1.
Figure 1.
Intersubject variability of the gut microbiome. (A) A principal coordinates plot of unweighted UniFrac distances computed using the Earth Microbiome Project (EMP) data set and the fecal samples from the American Gut Project (AGP) data set. Even though the EMP data include samples from many of the environments on the planet, including hydrothermal vents, soils, marine sediment, and many others, the extent of diversity associated with just the large intestine of a single mammal is one of the dominating clusters of microbial diversity. (B) Dynamic ranges of the 50 most abundant genera in the human fecal microbiome from 9316 individuals. These data are based off of a single sample per person, and only consider organisms observed in at least 100 people. Even though Bacteroides are ranked the highest, there are individuals with up to 3 orders of magnitude lower relative abundance of those genera, and that genera was not detected in approximately 1% of the individuals. PC, principal coordinates
Figure 2.
Figure 2.
Interindividual variability is a stronger discriminatory factor than diet, even under extreme dietary changes. (A) Principal coordinates analysis plot of unweighted UniFrac distances of the subjects (color) and their diets (shape). (B) Principal coordinates analysis plot with traces to show the individual variation over time, each edge is connected according to the collection time point. PC, principal coordinates.
Figure 3.
Figure 3.
Conducting a clinical microbiome experiment warrants careful attention to numerous factors. (A) Stratification by potential confounders (eg, age, sex, diet, lifestyle factors, and medications) can help resolve differences in microbiota between groups of interest that might otherwise be masked by a confounder effect. (B) Longitudinal studies are especially powerful because they both control for confounding factors and allow for the assessment of community stability. (C) For all studies, standardizing technical factors and sample processing are essential to control for variation introduced by every step of the process: kit reagents, primers, sample storage, and other factors. The collection and curation of metadata about all aspects of each sample, from clinical variables to sample processing, are crucial for data interpretation; without metadata, it is difficult to draw meaningful conclusions from sequencing data
Figure 4.
Figure 4.
Once samples are collected, the samples can be put through molecular preparations and DNA sequencing to generate microbiome data. Two common types of protocols are amplicon sequencing and shotgun sequencing. In amplicon sequencing, PCR primers are used to target a specific region of a specific gene, focusing sequencing effort on just those fragments. One of the most widely used protocols targets the V4 region of the 16S rRNA gene. In shotgun sequencing, the DNA in the sample is randomly sheared and sequenced, generating data from many different parts of the genome. The specifics of the molecular protocol used before shotgun sequencing are important for what type of data are being examined, and this type of sequencing can be used, for example, for metagenomics and metatranscriptomics. The initial processing performed on the data after sequencing depends on the type of sequencing performed. For amplicon studies, one common strategy is to upload the data into Qiita and to use Deblur to resolve sequence data into single-sequence variants called suboperational taxonomic units (sOTUs). Taxonomic assignments generally are performed using naive Bayes classifiers such as the RDP classifier, as implemented in the q2-feature-classifier against reference databases such as Greengenes, SILVA, RDP, or UNITE (fungal internal transcribed spacer [ITS]) depending on the amplicon target. Shotgun sequencing of host-associated samples first requires preprocessing to remove either host DNA before analysis. Typically, the shotgun data then are summarized using tools such as Kraken, MEGAN, or HUMAnN2 to generate taxonomic or functional profiles, or are assembled with tools such as metaSPAdes and MEGAHIT. For both sequencing methods, higher-level analyses (eg, α and β diversity, taxonomic profiling, and machine learning) subsequently are used to assay patterns of microbiome variation in the context of the study design. Metagenomic assemblies also can be analyzed through platforms such as Anvi’o. SourceTracker, a Bayesian estimator of the sources that make up each unknown community, is useful for classifying microbial samples according to the environment of origin. Citizen Science platforms, such as the American Gut Project, standardize the molecular work and bioinformatic processing to generate a basic summary report of the content of an individuals sample. In the case of the American Gut Project, the samples also are placed into the context of a few other popular microbiome studies through data integration.
Figure 4.
Figure 4.
Once samples are collected, the samples can be put through molecular preparations and DNA sequencing to generate microbiome data. Two common types of protocols are amplicon sequencing and shotgun sequencing. In amplicon sequencing, PCR primers are used to target a specific region of a specific gene, focusing sequencing effort on just those fragments. One of the most widely used protocols targets the V4 region of the 16S rRNA gene. In shotgun sequencing, the DNA in the sample is randomly sheared and sequenced, generating data from many different parts of the genome. The specifics of the molecular protocol used before shotgun sequencing are important for what type of data are being examined, and this type of sequencing can be used, for example, for metagenomics and metatranscriptomics. The initial processing performed on the data after sequencing depends on the type of sequencing performed. For amplicon studies, one common strategy is to upload the data into Qiita and to use Deblur to resolve sequence data into single-sequence variants called suboperational taxonomic units (sOTUs). Taxonomic assignments generally are performed using naive Bayes classifiers such as the RDP classifier, as implemented in the q2-feature-classifier against reference databases such as Greengenes, SILVA, RDP, or UNITE (fungal internal transcribed spacer [ITS]) depending on the amplicon target. Shotgun sequencing of host-associated samples first requires preprocessing to remove either host DNA before analysis. Typically, the shotgun data then are summarized using tools such as Kraken, MEGAN, or HUMAnN2 to generate taxonomic or functional profiles, or are assembled with tools such as metaSPAdes and MEGAHIT. For both sequencing methods, higher-level analyses (eg, α and β diversity, taxonomic profiling, and machine learning) subsequently are used to assay patterns of microbiome variation in the context of the study design. Metagenomic assemblies also can be analyzed through platforms such as Anvi’o. SourceTracker, a Bayesian estimator of the sources that make up each unknown community, is useful for classifying microbial samples according to the environment of origin. Citizen Science platforms, such as the American Gut Project, standardize the molecular work and bioinformatic processing to generate a basic summary report of the content of an individuals sample. In the case of the American Gut Project, the samples also are placed into the context of a few other popular microbiome studies through data integration.

References

    1. Turnbaugh PJ, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature 2009;457:480–484. - PMC - PubMed
    1. Ridaura VK, Faith JJ, Rey FE, et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 2013;341:1241214. - PMC - PubMed
    1. Le Chatelier E, Nielsen T, Qin J, et al. Richness of human gut microbiome correlates with metabolic markers. Nature 2013; 500:541–546. - PubMed
    1. Cotillard A, Kennedy SP, Kong LC, et al. Dietary intervention impact on gut microbial gene richness. Nature 2013; 500:585–588. - PubMed
    1. Frank DN, St Amand AL, Feldman RA, et al. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci U S A 2007;104:13780–13785. - PMC - PubMed