. 2022 Nov 8;23(1):236.

doi: 10.1186/s13059-022-02805-9.

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Jasmijn A Baaijens^#^{1

2}, Alessandro Zulli^#³, Isabel M Ott^#⁴, Ioanna Nika⁵, Mart J van der Lugt⁵, Mary E Petrone⁴, Tara Alpert⁴, Joseph R Fauver^{4

6}, Chaney C Kalinich⁴, Chantal B F Vogels⁴, Mallery I Breban⁴, Claire Duvallet⁷, Kyle A McElroy⁷, Newsha Ghaeli⁷, Maxim Imakaev⁷, Malaika F Mckenzie-Bennett⁸, Keith Robison⁸, Alex Plocik⁸, Rebecca Schilling⁸, Martha Pierson⁸, Rebecca Littlefield⁸, Michelle L Spencer⁸, Birgitte B Simen⁸; Yale SARS-CoV-2 Genomic Surveillance Initiative; William P Hanage⁹, Nathan D Grubaugh^#^{4

10}, Jordan Peccia^#³, Michael Baym^#¹¹

Collaborators, Affiliations

Collaborators

Yale SARS-CoV-2 Genomic Surveillance Initiative:
Ahmad Altajar, Anderson F Brito, Anne E Watkins, Anthony Muyombwe, Caleb Neal, Chen Liu, Christopher Castaldi, Claire Pearson, David R Peaper, Eva Laszlo, Irina R Tikhonova, Jafar Razeq, Jessica E Rothman, Jianhui Wang, Kaya Bilguvar, Linda Niccolai, Madeline S Wilson, Margaret L Anderson, Marie L Landry, Mark D Adams, Pei Hui, Randy Downing, Rebecca Earnest, Shrikant Mane, Steven Murphy

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. j.a.baaijens@tudelft.nl.
² Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands. j.a.baaijens@tudelft.nl.
³ Department of Chemical and Environmental Engineering, Yale University, New Haven, CT, USA.
⁴ Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA.
⁵ Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands.
⁶ Department of Epidemiology, University of Nebraska Medical Center, Omaha, NE, USA.
⁷ Biobot Analytics, Inc., Cambridge, MA, USA.
⁸ Ginkgo Bioworks, Inc., Boston, MA, USA.
⁹ Center for Communicable Disease Dynamics and Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
¹⁰ Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA.
¹¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

^# Contributed equally.

PMID: 36348471
PMCID: PMC9643916
DOI: 10.1186/s13059-022-02805-9

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Jasmijn A Baaijens et al. Genome Biol. 2022.

. 2022 Nov 8;23(1):236.

doi: 10.1186/s13059-022-02805-9.

Authors

Collaborators

Yale SARS-CoV-2 Genomic Surveillance Initiative:
Ahmad Altajar, Anderson F Brito, Anne E Watkins, Anthony Muyombwe, Caleb Neal, Chen Liu, Christopher Castaldi, Claire Pearson, David R Peaper, Eva Laszlo, Irina R Tikhonova, Jafar Razeq, Jessica E Rothman, Jianhui Wang, Kaya Bilguvar, Linda Niccolai, Madeline S Wilson, Margaret L Anderson, Marie L Landry, Mark D Adams, Pei Hui, Randy Downing, Rebecca Earnest, Shrikant Mane, Steven Murphy

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. j.a.baaijens@tudelft.nl.
² Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands. j.a.baaijens@tudelft.nl.
³ Department of Chemical and Environmental Engineering, Yale University, New Haven, CT, USA.
⁴ Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA.
⁵ Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands.
⁶ Department of Epidemiology, University of Nebraska Medical Center, Omaha, NE, USA.
⁷ Biobot Analytics, Inc., Cambridge, MA, USA.
⁸ Ginkgo Bioworks, Inc., Boston, MA, USA.
⁹ Center for Communicable Disease Dynamics and Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
¹⁰ Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA.
¹¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

^# Contributed equally.

PMID: 36348471
PMCID: PMC9643916
DOI: 10.1186/s13059-022-02805-9

Abstract

Effectively monitoring the spread of SARS-CoV-2 mutants is essential to efforts to counter the ongoing pandemic. Predicting lineage abundance from wastewater, however, is technically challenging. We show that by sequencing SARS-CoV-2 RNA in wastewater and applying algorithms initially used for transcriptome quantification, we can estimate lineage abundance in wastewater samples. We find high variability in signal among individual samples, but the overall trends match those observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in mutant prevalence in situations where clinical sequencing is unavailable.

PubMed Disclaimer

Conflict of interest statement

N.D.G. is an infectious diseases consultant for Tempus Labs. W.P.H. is a scientific advisory board member to Biobot Analytics and has received compensation for expert witness testimony on the expected course of the pandemic. N.G. is a co-founder of Biobot Analytics; C.D., K.A.M., and M.I. are employees of Biobot Analytics.

Figures

**Fig. 1**
VLQ, a computational approach to lineage abundance estimation from wastewater sequencing data. a Computational similarity between RNA transcript quantification and lineage abundance estimation. b Key aspects of the kallisto algorithm in the context of lineage abundance estimation. c Our workflow uses multiple reference sequence per lineage to capture within-lineage variation. Applying kallisto (as in part b) results in abundance estimates per reference sequence. These abundances are filtered using a minimal abundance cutoff and subsequently summed per lineage to obtain abundance estimates per lineage. Finally, lineage abundances are reported

**Fig. 2**
Benchmarking results on simulated wastewater sequencing data of a representative mixture of background sequences and a variant lineage (VOC; here: B.1.1.7, B.1.351, B.1.427, B.1.429, P.1). a Estimated lineage abundance (VOC frequency) versus true abundance on whole genome sequencing data with a depth of 1000×. b Estimated lineage abundance (VOC frequency) versus true abundance on Spike-only sequencing data with a depth of 10,000×. c, d Relative prediction error per lineage for the estimated frequencies presented in panels a and b, respectively. Relative prediction errors are defined as the absolute difference between true and estimated frequency, relative to the true frequency. VOC, variant of concern

**Fig. 3**
a RNA levels in wastewater (copies/ml sludge, displayed on the left vertical axis) follow the same trend as COVID-19 case rates (cases per 100K people, displayed on the right vertical axis). b Percent genome with >20× coverage versus sludge Ct values. c Impact of genome coverage on predicted B.1.1.7 abundance for random subsamples of a sludge sample with full genome coverage. The horizontal dotted line indicated the predicted B.1.1.7 abundance for the full sample (99% genome coverage)

**Fig. 4**
Wastewater versus clinical abundance estimates for B.1.1.7 and B.1.526 in New Haven from early January 2021 to late April 2021. Dates of clinical sampling correspond to the date of specimen collection. In addition, ddPCR-based abundance estimates of lineages with the H69/V70 deletion (likely B.1.1.7) are shown for wastewater samples taken every six days. Confidence intervals are computed from ddPCR confidence intervals for measured copied of the H69/V70 deletion and measured copies of wild type present (see the “Methods” section). Vertical dashes on the x-axis indicate timepoints where wastewater sequencing data was obtained that passed the filtering criteria (Ct value < 31 or Ct < 34 with at least 0.5M reads aligned)

**Fig. 5**
Wastewater versus GISAID abundance estimates for B.1.1.7, B.1.427, B.1.429, and B.1.526 at 16 locations across 8 states of the US. Samples were collected between late December 2020 and late January 2021; the sampling date and location are indicated on the horizontal axis. Samples are sorted by location, with different locations separated by a dotted line and different states separated by a solid line

See this image and copyright information in PMC

Update of

Variant abundance estimation for SARS-CoV-2 in wastewater using RNA-Seq quantification.
Baaijens JA, Zulli A, Ott IM, Petrone ME, Alpert T, Fauver JR, Kalinich CC, Vogels CBF, Breban MI, Duvallet C, McElroy K, Ghaeli N, Imakaev M, Mckenzie-Bennett M, Robison K, Plocik A, Schilling R, Pierson M, Littlefield R, Spencer M, Simen BB; Yale SARS-CoV-2 Genomic Surveillance Initiative; Hanage WP, Grubaugh ND, Peccia J, Baym M. Baaijens JA, et al. medRxiv [Preprint]. 2021 Sep 2:2021.08.31.21262938. doi: 10.1101/2021.08.31.21262938. medRxiv. 2021. Update in: Genome Biol. 2022 Nov 8;23(1):236. doi: 10.1186/s13059-022-02805-9. PMID: 34494031 Free PMC article. Updated. Preprint.

References

1. Davies NG, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science. 2021;372(653):eabg3055. 10.1126/science.abg3055. - DOI - PMC - PubMed
1. Lucas C, et al. Impact of circulating SARS-CoV-2 variants on mRNA vaccine-induced immunity in uninfected and previously infected individuals. bioRxiv. 2021. 10.1101/2021.07.14.21260307. - PMC - PubMed
1. Knyazev S, et al. Unlocking capacities of viral genomics for the COVID-19 pandemic response. 2021.
1. CDC . SARS-CoV-2 Variant Classifications and Definitions. 2021.
1. GISAID - Initiative. https://www.gisaid.org/.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Collaborators

Affiliations

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous