Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2021 Sep 2:2021.08.31.21262938.
doi: 10.1101/2021.08.31.21262938.

Variant abundance estimation for SARS-CoV-2 in wastewater using RNA-Seq quantification

Affiliations

Variant abundance estimation for SARS-CoV-2 in wastewater using RNA-Seq quantification

Jasmijn A Baaijens et al. medRxiv. .

Update in

  • Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques.
    Baaijens JA, Zulli A, Ott IM, Nika I, van der Lugt MJ, Petrone ME, Alpert T, Fauver JR, Kalinich CC, Vogels CBF, Breban MI, Duvallet C, McElroy KA, Ghaeli N, Imakaev M, Mckenzie-Bennett MF, Robison K, Plocik A, Schilling R, Pierson M, Littlefield R, Spencer ML, Simen BB; Yale SARS-CoV-2 Genomic Surveillance Initiative; Hanage WP, Grubaugh ND, Peccia J, Baym M. Baaijens JA, et al. Genome Biol. 2022 Nov 8;23(1):236. doi: 10.1186/s13059-022-02805-9. Genome Biol. 2022. PMID: 36348471 Free PMC article.

Abstract

Effectively monitoring the spread of SARS-CoV-2 variants is essential to efforts to counter the ongoing pandemic. Wastewater monitoring of SARS-CoV-2 RNA has proven an effective and efficient technique to approximate COVID-19 case rates in the population. Predicting variant abundances from wastewater, however, is technically challenging. Here we show that by sequencing SARS-CoV-2 RNA in wastewater and applying computational techniques initially used for RNA-Seq quantification, we can estimate the abundance of variants in wastewater samples. We show by sequencing samples from wastewater and clinical isolates in Connecticut U.S.A. between January and April 2021 that the temporal dynamics of variant strains broadly correspond. We further show that this technique can be used with other wastewater sequencing techniques by expanding to samples taken across the United States in a similar timeframe. We find high variability in signal among individual samples, and limited ability to detect the presence of variants with clinical frequencies <10%; nevertheless, the overall trends match what we observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in variant prevalence in situations where clinical sequencing is unavailable or impractical.

PubMed Disclaimer

Conflict of interest statement

Competing interests

N.D.G. is an infectious diseases consultant for Tempus Labs. W.P.H. is a scientific advisory board member to Biobot Analytics and has received compensation for expert witness testimony on the expected course of the pandemic. N.G. is co-founder of Biobot Analytics; C.D., K.A.M., and M.I. are employees of Biobot Analytics.

Figures

Figure 1.
Figure 1.
Computational approach to variant of concern (variant) abundance estimation. a) Computational similarity between RNA transcript quantification and variant abundance estimation. b) Key aspects of the kallisto algorithm in the context of variant abundance estimation. c) Our workflow uses multiple reference sequence per lineage to capture within-lineage variation. Applying kallisto (as in part b) results in abundance estimates per reference sequence. These abundances are filtered using a minimal abundance cutoff and subsequently summed per lineage to obtain abundance estimates per lineage. Finally, variant abundances are reported.
Figure 2.
Figure 2.
Estimated variant abundances and relative prediction errors. Relative prediction errors are defined as the absolute difference between true and estimated frequency, relative to the true frequency.
Figure 3.
Figure 3.
a) RNA levels in wastewater (copies/ml sludge, displayed on left vertical axis) follow the same trend as COVID-19 case rates (cases per 100K people, displayed on right vertical axis). b) Percent genome with >20x coverage versus sludge Ct values. c) Impact of genome coverage on predicted B.1.1.7 abundance for random subsamples of a sludge sample with full genome coverage. The horizontal dotted line indicated the predicted B.1.1.7 abundance for the full sample (99% genome coverage).
Figure 4.
Figure 4.
Wastewater versus clinical abundance estimates for B.1.1.7 and B.1.526 in New Haven from early January 2021 to late April 2021. Dates of clinical sampling correspond to the date of specimen collection.
Figure 5.
Figure 5.
Wastewater versus GISAID abundance estimates for B.1.1.7, B.1.427, B.1.429 and B.1.526 at 16 locations across 8 states of the US. Samples were collected between late December 2020 and late January 2021; sampling date and location are indicated on the horizontal axis. Samples are sorted by location, with different locations separated by a dotted line and different states separated by a solid line.

References

    1. Davies N. G. et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science 372, (2021). - PMC - PubMed
    1. Lucas C. et al. Impact of circulating SARS-CoV-2 variants on mRNA vaccine-induced immunity in uninfected and previously infected individuals. bioRxiv (2021) doi: 10.1101/2021.07.14.21260307. - DOI - PMC - PubMed
    1. CDC. SARS-CoV-2 Variant Classifications and Definitions. https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html (2021).
    1. GISAID - Initiative. https://www.gisaid.org/.
    1. Zhang W. et al. Emergence of a Novel SARS-CoV-2 Variant in Southern California. JAMA 325, 1324–1326 (2021). - PMC - PubMed

Publication types