Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 23;18(9):e1010472.
doi: 10.1371/journal.pcbi.1010472. eCollection 2022 Sep.

MicrobiomeCensus estimates human population sizes from wastewater samples based on inter-individual variability in gut microbiomes

Affiliations

MicrobiomeCensus estimates human population sizes from wastewater samples based on inter-individual variability in gut microbiomes

Lin Zhang et al. PLoS Comput Biol. .

Abstract

The metagenome embedded in urban sewage is an attractive new data source to understand urban ecology and assess human health status at scales beyond a single host. Analyzing the viral fraction of wastewater in the ongoing COVID-19 pandemic has shown the potential of wastewater as aggregated samples for early detection, prevalence monitoring, and variant identification of human diseases in large populations. However, using census-based population size instead of real-time population estimates can mislead the interpretation of data acquired from sewage, hindering assessment of representativeness, inference of prevalence, or comparisons of taxa across sites. Here, we show that taxon abundance and sub-species diversisty in gut-associated microbiomes are new feature space to utilize for human population estimation. Using a population-scale human gut microbiome sample of over 1,100 people, we found that taxon-abundance distributions of gut-associated multi-person microbiomes exhibited generalizable relationships with respect to human population size. Here and throughout this paper, the human population size is essentially the sample size from the wastewater sample. We present a new algorithm, MicrobiomeCensus, for estimating human population size from sewage samples. MicrobiomeCensus harnesses the inter-individual variability in human gut microbiomes and performs maximum likelihood estimation based on simultaneous deviation of multiple taxa's relative abundances from their population means. MicrobiomeCensus outperformed generic algorithms in data-driven simulation benchmarks and detected population size differences in field data. New theorems are provided to justify our approach. This research provides a mathematical framework for inferring population sizes in real time from sewage samples, paving the way for more accurate ecological and public health studies utilizing the sewage metagenome.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: E.J.A has an equity stake in Biobot Analytics. C. Duvallet is employed by Biobot Analytics.

Figures

Fig 1
Fig 1. An ideal sewage mixture simulation shows the potential of microbiome taxon abundance profiles as population census information sources.
(A) We generated an “ideal sewage mixture” consisting of gut microbiomes from different numbers of people. (B) Ranked abundance curves for gut microbiomes of one person and mixtures of multiple people exhibit different levels of dominance and diversity. Blue lines show the rank abundance curves in stool samples (one person), red lines show 10-person mixtures, and saffron lines show 100-person mixtures. In each scenario, ten examples are shown. All samples were rarefied to the same sequencing depths (4,000 seqs/sample). (C) The probability density function of the relative abundance of one taxon for different population sizes. OTU-2379, a Bifidobacterium taxon, was used as an example. Maroon dashed lines indicate the sample means. (D) Multiple taxa’s abundance variances in one-person samples and 100-person samples. The dominant taxa are shown (top100) and are sorted by their ranks in variance. (E) The ratios of the variances of one-person samples and 100-person samples across dominant gut microbial taxa.
Fig 2
Fig 2. Classifier performance of models utilizing gut microbiome taxon abundances.
Fig 3
Fig 3. MicrobiomeCensus statistic definition, model training, validation, and application.
(A) Example of computing the T statistic. (B) Simulation results for T with different population sizes. Grey points are simulation results. Red bars are means of 10,000 repeats performed for each population size. (C) Model training and tuning. We built the MicrobiomeCensus model using our T statistic and a maximum likelihood procedure. The training set consisted of 10,000 samples for population sizes ranging from 1–300, and 50% of the data were used to train and validate the model. Training and validation errors from different feature subsets are shown. Training errors are shown as blue lines, and validation errors are shown as red lines. (D) Model performance on simulation benchmark. After training and validation, the model utilized the top 120 abundant features. Model performance was tested on synthetic data generated from 550 different subjects not previously seen by the model. The training set consisted of 10,000 samples with population sizes from 1–300, and the testing set consisted of 10,000 repeats at the evaluated population sizes. The training error, testing error, and the error of the final model are shown. (E) Model performance evaluated using a testing set. Black solid dots indicate the means of the predicted values, and error bars indicate the standard deviations of the predicted values. (F) Application of the microbiome population model in sewage. Seventy-six composite samples (blue) were taken from three manholes on the MIT campus, and each sample was taken over 3 hours during the morning peak water usage hours. Twenty-five snapshot samples (grey) were taken using a peristaltic pump for 5 minutes at 1-hour intervals throughout a day.
Fig 4
Fig 4. Sub-species diversity in gut-associated bacterial species as a potential marker for human population size.
(A-F) Comparison of sub-species diversity of gut-associated bacteria in human gut microbiome samples (LifelinesDeep) and MIT sewage samples. Nucleotide diversity and numbers of polymorphic sites were computed from ten phylogenetic marker genes. (G) and (H) Simulation results showing intra-species diversity in response to increasing population size, as represented by the number of polymorphic sites (G) and nucleotide diversity (H).

Similar articles

Cited by

References

    1. Maritz JM, Ten Eyck TA, Elizabeth Alter S, Carlton JM. Patterns of protist diversity associated with raw sewage in New York City. ISME J. 2019;13(11):2750–2763. doi: 10.1038/s41396-019-0467-z - DOI - PMC - PubMed
    1. Berchenko Y, Manor Y, Freedman LS, Kaliner E, Grotto I, Mendelson E, et al. Estimation of polio infection prevalence from environmental surveillance data. Sci Transl Med. 2017;9(383). doi: 10.1126/scitranslmed.aaf6786 - DOI - PubMed
    1. Newton RJ, McLellan SL, Dila DK, Vineis JH, Morrison HG, Eren AM, et al. Sewage Reflects the Microbiomes of Human Populations. mBio. 2015;6(2). doi: 10.1128/mBio.02574-14 - DOI - PMC - PubMed
    1. Matus M, Duvallet C, Soule MK, Kearney SM, Endo N, Ghaeli N, et al. 24-hour multi-omics analysis of residential sewage reflects human activity and informs public health. bioRxiv. 2019; p. 728022.
    1. Medema G, Heijnen L, Elsinga G, Italiaander R, Brouwer A. Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands. Environ Sci Technol Lett. - PubMed

Publication types