Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 2;112(22):E2930-8.
doi: 10.1073/pnas.1423854112. Epub 2015 May 11.

Identifying personal microbiomes using metagenomic codes

Affiliations

Identifying personal microbiomes using metagenomic codes

Eric A Franzosa et al. Proc Natl Acad Sci U S A. .

Abstract

Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30-300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability-a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability.

Keywords: forensic genetics; human microbiome; metagenomics; microbial ecology; strain variation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Metagenomic codes (overview). (A) Three individuals and their metagenomic features (represented by capital letters) are shown. For each individual, a subset of features is highlighted that is unique among the three individuals. We refer to these sets as metagenomic codes. (B) The same three individuals reevaluated after weeks to months. Individual 1’s microbiome has remained stable, and his code still uniquely identifies him among the population (a true positive). Individual 2 has lost metagenomic feature C, and his code no longer identifies him (a false negative). Individual 3 has lost feature B and gained feature C. Individual 3 is still a true positive with respect to his own code, but also matches individual 2’s code (a false positive). (C) Illustration of the four metagenomic feature types considered in this work: OTUs, species, kilobase windows from reference genomes (kbwindows), and species-specific marker genes (markers) (see Methods and Table 1 for details).
Fig. 2.
Fig. 2.
Properties associated with microbiome feature stability. For each (body site, feature type) combination, we counted cases of features confidently detected across subjects’ first sampling visits (time 1). The fraction of these cases that remained confidently detected at subjects’ second sampling visits (time 2; weeks to months later) provided a measure of feature stability. Stability was positively and strongly correlated with (A) feature abundance and (B) feature prevalence. (C) Highly prevalent features that were not detected in subjects’ time 1 samples had a high probability of being acquired by time 2, particularly at more exposed sites (e.g., skin). (D) Sampling time interval had a less marked effect on stability. NA, a (body site, feature type) combination with <10 confident detection events at time 1. Abundance values for OTUs and species reflect relative abundance; abundance values for markers and kbwindows reflect RPKM units.
Fig. 3.
Fig. 3.
Temporal stability of metagenomic codes. (A) We identified unique metagenomic codes for individuals based on their first sampling visits (time 1); an individual whose microbial features were a subset of a second individual’s features had no unique code (black bars). Red bars represent true positives (TPs): codes that uniquely identified their owners at time 1 and again at the second sampling visit (time 2; weeks to months later). Blue bars represent false negatives (FNs): codes that matched no one at time 2. Pink and cyan bars represent false positives (FPs): codes that matched someone other than their owner at time 2, either in addition to their owner (TP+FP) or instead of their owner (FN+FP). (B) Average and SD of metagenomic code size. A target size (seven features) was imposed to reduce FPs. (C) Distribution of sampling time intervals for TPs and FNs, with each individual represented by a hash mark. FNs were weakly associated with longer sampling time intervals than TPs in a few body sites and very weakly in aggregate (Mann–Whitney u test). Green numbers indicate the number of individuals profiled at time 1 and time 2 for each (body site, feature type) combination (see Methods for an explanation of why kbwindows numbers differ from species and markers numbers).
Fig. 4.
Fig. 4.
Influence of strain-level variation on marker gene-based codes. (A) Species varied greatly in their likelihood to contribute marker genes to a code (vertical axis) and the numbers of marker genes thus contributed (horizontal axis). Samples from the anterior nares and posterior fornix body sites were typically identified by individual strains (several markers each) of a few dominant taxa, whereas stool and oral sites were instead identified by combinations of species within (e.g., Bacteroides) or across genera, respectively. (B) Each row depicts the abundance of 293 Prevotella copri-specific marker genes in a stool metagenome. The three dark gray rows correspond to three sampling visits from one subject (HMP identifier 158802708) and the two light gray rows correspond to two visits from a second subject (159166850). Certain markers were consistently absent in one subject across visits and consistently present in the other, indicative of stable carriage of subject-specific strains of P. copri. Red markers were included in the subjects’ codes; triangles indicate encoded markers that differentiated the first subject from the second subject (or vice versa). Heights of marker genes within each row vary with gene abundance (binned according to the confident detection, relaxed detection, and confident nondetection thresholds used in the construction and evaluation of metagenomic codes; see inset key). (C) This panel uses the same format as B to explore marker profiles of Leptotrichia buccalis from the supragingival plaque (oral) samples of HMP subjects 159591683 and 159207311. Here, an open triangle represents an encoded marker gene that was acquired between time points (in a potential lateral transfer or strain replacement event), which could contribute to a possible false positive match. (D) This panel uses the format from B and C to explore marker profiles of Lactobacillus crispatus from the posterior fornix (vaginal) samples of HMP subjects 160502038 and 764042746.

Comment in

References

    1. Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–214. - PMC - PubMed
    1. Qin J, et al. MetaHIT Consortium A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65. - PMC - PubMed
    1. Fierer N, et al. Forensic identification using skin bacterial communities. Proc Natl Acad Sci USA. 2010;107(14):6477–6481. - PMC - PubMed
    1. Schloissnig S, et al. Genomic variation landscape of the human gut microbiome. Nature. 2013;493(7430):45–50. - PMC - PubMed
    1. Faith JJ, et al. The long-term stability of the human gut microbiota. Science. 2013;341(6141):1237439. - PMC - PubMed

Publication types

Substances