Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 8;23(1):236.
doi: 10.1186/s13059-022-02805-9.

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Collaborators, Affiliations

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques

Jasmijn A Baaijens et al. Genome Biol. .

Abstract

Effectively monitoring the spread of SARS-CoV-2 mutants is essential to efforts to counter the ongoing pandemic. Predicting lineage abundance from wastewater, however, is technically challenging. We show that by sequencing SARS-CoV-2 RNA in wastewater and applying algorithms initially used for transcriptome quantification, we can estimate lineage abundance in wastewater samples. We find high variability in signal among individual samples, but the overall trends match those observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in mutant prevalence in situations where clinical sequencing is unavailable.

PubMed Disclaimer

Conflict of interest statement

N.D.G. is an infectious diseases consultant for Tempus Labs. W.P.H. is a scientific advisory board member to Biobot Analytics and has received compensation for expert witness testimony on the expected course of the pandemic. N.G. is a co-founder of Biobot Analytics; C.D., K.A.M., and M.I. are employees of Biobot Analytics.

Figures

Fig. 1
Fig. 1
VLQ, a computational approach to lineage abundance estimation from wastewater sequencing data. a Computational similarity between RNA transcript quantification and lineage abundance estimation. b Key aspects of the kallisto algorithm in the context of lineage abundance estimation. c Our workflow uses multiple reference sequence per lineage to capture within-lineage variation. Applying kallisto (as in part b) results in abundance estimates per reference sequence. These abundances are filtered using a minimal abundance cutoff and subsequently summed per lineage to obtain abundance estimates per lineage. Finally, lineage abundances are reported
Fig. 2
Fig. 2
Benchmarking results on simulated wastewater sequencing data of a representative mixture of background sequences and a variant lineage (VOC; here: B.1.1.7, B.1.351, B.1.427, B.1.429, P.1). a Estimated lineage abundance (VOC frequency) versus true abundance on whole genome sequencing data with a depth of 1000×. b Estimated lineage abundance (VOC frequency) versus true abundance on Spike-only sequencing data with a depth of 10,000×. c, d Relative prediction error per lineage for the estimated frequencies presented in panels a and b, respectively. Relative prediction errors are defined as the absolute difference between true and estimated frequency, relative to the true frequency. VOC, variant of concern
Fig. 3
Fig. 3
a RNA levels in wastewater (copies/ml sludge, displayed on the left vertical axis) follow the same trend as COVID-19 case rates (cases per 100K people, displayed on the right vertical axis). b Percent genome with >20× coverage versus sludge Ct values. c Impact of genome coverage on predicted B.1.1.7 abundance for random subsamples of a sludge sample with full genome coverage. The horizontal dotted line indicated the predicted B.1.1.7 abundance for the full sample (99% genome coverage)
Fig. 4
Fig. 4
Wastewater versus clinical abundance estimates for B.1.1.7 and B.1.526 in New Haven from early January 2021 to late April 2021. Dates of clinical sampling correspond to the date of specimen collection. In addition, ddPCR-based abundance estimates of lineages with the H69/V70 deletion (likely B.1.1.7) are shown for wastewater samples taken every six days. Confidence intervals are computed from ddPCR confidence intervals for measured copied of the H69/V70 deletion and measured copies of wild type present (see the “Methods” section). Vertical dashes on the x-axis indicate timepoints where wastewater sequencing data was obtained that passed the filtering criteria (Ct value < 31 or Ct < 34 with at least 0.5M reads aligned)
Fig. 5
Fig. 5
Wastewater versus GISAID abundance estimates for B.1.1.7, B.1.427, B.1.429, and B.1.526 at 16 locations across 8 states of the US. Samples were collected between late December 2020 and late January 2021; the sampling date and location are indicated on the horizontal axis. Samples are sorted by location, with different locations separated by a dotted line and different states separated by a solid line

Update of

References

    1. Davies NG, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science. 2021;372(653):eabg3055. 10.1126/science.abg3055. - DOI - PMC - PubMed
    1. Lucas C, et al. Impact of circulating SARS-CoV-2 variants on mRNA vaccine-induced immunity in uninfected and previously infected individuals. bioRxiv. 2021. 10.1101/2021.07.14.21260307. - PMC - PubMed
    1. Knyazev S, et al. Unlocking capacities of viral genomics for the COVID-19 pandemic response. 2021.
    1. CDC . SARS-CoV-2 Variant Classifications and Definitions. 2021.
    1. GISAID - Initiative. https://www.gisaid.org/.

Publication types