. 2021 Mar 5:10:e64206.

doi: 10.7554/eLife.64206.

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Daniel B Larremore^{1

2}, Bailey K Fosdick³, Kate M Bubar^{4

5}, Sam Zhang⁴, Stephen M Kissler⁶, C Jessica E Metcalf⁷, Caroline O Buckee^{8

9}, Yonatan H Grad⁶

Affiliations

¹ Department of Computer Science, University of Colorado Boulder, Boulder, United States.
² BioFrontiers Institute, University of Colorado Boulder, Boulder, United States.
³ Department of Statistics, Colorado State University, Fort Collins, United States.
⁴ Department of Applied Mathematics, University of Colorado Boulder, Boulder, United States.
⁵ IQ Biology Program, University of Colorado Boulder, Boulder, United States.
⁶ Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, United States.
⁷ Department of Ecology and Evolutionary Biology and the Woodrow Wilson School, Princeton University, Princeton, United States.
⁸ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States.
⁹ Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, United States.

PMID: 33666169
PMCID: PMC7979159
DOI: 10.7554/eLife.64206

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Daniel B Larremore et al. Elife. 2021.

. 2021 Mar 5:10:e64206.

doi: 10.7554/eLife.64206.

Authors

Daniel B Larremore^{1

2}, Bailey K Fosdick³, Kate M Bubar^{4

5}, Sam Zhang⁴, Stephen M Kissler⁶, C Jessica E Metcalf⁷, Caroline O Buckee^{8

9}, Yonatan H Grad⁶

Affiliations

¹ Department of Computer Science, University of Colorado Boulder, Boulder, United States.
² BioFrontiers Institute, University of Colorado Boulder, Boulder, United States.
³ Department of Statistics, Colorado State University, Fort Collins, United States.
⁴ Department of Applied Mathematics, University of Colorado Boulder, Boulder, United States.
⁵ IQ Biology Program, University of Colorado Boulder, Boulder, United States.
⁶ Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, United States.
⁷ Department of Ecology and Evolutionary Biology and the Woodrow Wilson School, Princeton University, Princeton, United States.
⁸ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States.
⁹ Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, United States.

PMID: 33666169
PMCID: PMC7979159
DOI: 10.7554/eLife.64206

Abstract

Establishing how many people have been infected by SARS-CoV-2 remains an urgent priority for controlling the COVID-19 pandemic. Serological tests that identify past infection can be used to estimate cumulative incidence, but the relative accuracy and robustness of various sampling strategies have been unclear. We developed a flexible framework that integrates uncertainty from test characteristics, sample size, and heterogeneity in seroprevalence across subpopulations to compare estimates from sampling schemes. Using the same framework and making the assumption that seropositivity indicates immune protection, we propagated estimates and uncertainty through dynamical models to assess uncertainty in the epidemiological parameters needed to evaluate public health interventions and found that sampling schemes informed by demographics and contact networks outperform uniform sampling. The framework can be adapted to optimize serosurvey design given test characteristics and capacity, population demography, sampling strategy, and modeling approach, and can be tailored to support decision-making around introducing or removing interventions.

Keywords: COVID-19; SARS-CoV-2; antibody; epidemiology; global health; infectious disease; microbiology; modeling; none; serology; uncertainty.

PubMed Disclaimer

Conflict of interest statement

DL, BF, KB, SZ, SK, CM, CB, YG No competing interests declared

Figures

**Figure 1.. Framework for estimating seroprevalence and epidemiological parameters and the associated uncertainty, and for designing seroprevalence studies.**

**Figure 2.. Uncertainty of population seroprevalence estimates as a function of number of samples and true population rate.**
Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points in (A) a contour plot and (B) for selected seroprevalence values, based on a serological test with 90% sensitivity and >99.9% specificity (Figure 2—figure supplement 1 depicts results for other sensitivity and specificity values). In total, 5000 samples are sufficient to estimate any seroprevalence to within a worst-case tolerance of ±1.4 percentage points (e.g., 20% ± 1.4% = [18.6%, 21.4%]), even with the imperfect test studied. Each point or pixel is averaged over 250 stochastic draws from the specified seroprevalence with the indicated sensitivity and specificity.

**Figure 2—figure supplement 1.. Uncertainty of population seroprevalence estimates as a function of number of samples and true population rate.**
Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points in contour plots and for selected seroprevalence values, ranging from 1% to 50%, based on a serological test with (**A, B**) 96.6% sensitivity and >99.9% specificity, matching the claims of the Roche IgG test (U.S. Food and Drug Administration, 2021), (**C, D**) 95.0% sensitivity and >99.9% specificity, matching the claims of the Abbott Architect IgG test (U.S. Food and Drug Administration, 2021), and (**E, F**) 100% sensitivity and specificity, representing an ideal test, complementing the results for a test with 90% sensitivity and >99.9% specificity, matching the claims of the Euroimmun IgG test (U.S. Food and Drug Administration, 2021) shown in Figure 2. See Supplementary file 1 for details on serological test kits.

**Figure 3.. Uncertainty of overall seroprevalence estimates from convenience and formal sampling strategies.**
Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points, based on a serological test with 90% sensitivity and >99.9% specificity (Figure 3—figure supplement 1 depicts results for other sensitivity and specificity values). (A) Curves show the decrease in average CI widths for 15% seroprevalence, illustrating the advantages of using uniform and model and demographics informed (MDI) samples over convenience samples. (B) Contour plots show average CI widths for various total sample counts and overall seroprevalence ranging from 5% to 50%. Convenience samples derived from newborn blood spots (reflecting the age demographics of mothers) or U.S. blood donors improve with additional sampling but retain baseline uncertainty due to demographics not covered by the convenience sample. For the estimation of overall seroprevalence, uniform sampling is marginally superior to this example of the MDI sampling strategy, which was designed to optimize estimation of the effective reproductive number $R_{eff}$ . Each point or pixel is averaged over 250 stochastic draws from the specified seroprevalence with the indicated sensitivity and specificity.

**Figure 3—figure supplement 1.. Uncertainty of overall seroprevalence estimates from convenience and formal sampling strategies.**
Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points, based on a serological test with (**A, B**) 96.6% sensitivity and >99.9% specificity, matching the claims of the Roche IgG test (U.S. Food and Drug Administration, 2021), (**C, D**) 95.0% sensitivity and >99.9% specificity, matching the claims of the Abbott Architect IgG test (U.S. Food and Drug Administration, 2021), and (**E, F**) 100% sensitivity and specificity, representing an ideal test, complementing the results for a test with 90% sensitivity and >99.9% specificity, matching the claims of the Euroimmun IgG test (U.S. Food and Drug Administration, 2021) shown in Figure 3. (**A, C, E**) Curves show the decrease in average CI widths for 15% seroprevalence, illustrating the advantages of using uniform and model and demographics informed (MDI) samples over convenience samples. (**B, D, F**) Contour plots show average CI widths for various total sample counts and overall seroprevalence. Convenience samples derived from newborn blood spots or U.S. blood donors improve with additional sampling but retain baseline uncertainty due to demographics not covered by the convenience sample. For the estimation of overall seroprevalence, uniform sampling is marginally superior to this example of the MDI sampling strategy, which was designed to optimize estimation of $R_{eff}$ . Each point or pixel is averaged over 250 stochastic draws from the specified seroprevalence, ranging from 5% to 50%, with the indicated sensitivity and specificity.

**Figure 4.. Uncertainty in serological data produces uncertainty in simulated epidemic peak height and timing.**
Serological test outcomes for $n = 100$ tests (A; red) and $n = 1000$ tests (B; blue) produce (**C, D**) posterior seroprevalence estimates with quantified uncertainty with posterior means of 15.2% and 14.6%, respectively; estimates uncorrected for assay performance bias: 13.0% and 13.0%. (**E, F**) Samples from the seroprevalence posterior produce a distribution of simulated epidemic curves for scenarios of 25% and 50% social distancing (see Materials and methods), leading to uncertainty in (G) epidemic peak and (H) timing, which is mitigated in the $n = 1000$ sample scenario. Boxplot whiskers span 1.5× IQR, boxes span central quartile, lines indicate medians, and outliers were suppressed. se, sensitivity; sp, specificity.

**Figure 4—figure supplement 1.. Uncertainty in serological data produces uncertainty in estimates of epidemic peak height and timing, even when the test has perfect sensitivity and specificity.**
Serological test outcomes for (A) $n = 100$ tests and (B) $n = 1000$ tests are shown as bar graphs for four tests with sensitivity and specificity values as indicated. Serological test samples were not generated stochastically but instead according to expectation to highlight how sensitivity and specificity affect inference. Posterior seroprevalence estimates for (C) $n = 100$ and (D) $n = 1000$ scenarios reveal that Bayesian estimate places posteriors over the correct values (15%) but with uncertainty that depends on n (compare C to D) and test characteristics (compare peak heights of yellow and purple to blue and orange). Samples from the seroprevalence posterior produce a distribution of epidemic curves for scenarios of 25% and 50% social distancing (see Materials and methods), leading to uncertainty in (E) height of epidemic peak and (F) timing of epidemic peak. Uncertainty is mitigated but not eliminated in the $n = 1000$ scenario, just as uncertainty is mitigated but not eliminated using a perfect serological test. Boxplots reflect 100 samples from SEIR dynamics; whiskers span 1.5×IQR, boxes span central quartile, lines indicate medians, and outliers not shown.

**Figure 5.. Convenience and formal samples provide serological and epidemiological parameter estimates.**
(**A–D**) For four sampling strategies, $n = 1000$ tests were allocated to age groups with negative tests (gray outlines) and positive tests (colors) as shown, drawn stochastically based on seroprevalence estimates reflecting SARS-CoV-2 serosurvey outcomes from Geneva, Switzerland, as of May 2020 (Stringhini et al., 2020) for a test with 90% sensitivity and >99.9% specificity. The model and demographics informed (MDI) strategy shown was designed to optimize estimation of $R_{eff}$ . (**E–H**) Age-group seroprevalence estimates $θ_{i}$ are shown as boxplots (boxes 90% CIs, whiskers 95% CIs); dots indicate the true values from which data were sampled (Stringhini et al., 2020). Note the decreased uncertainty for boxes with higher sampling rates. (I) Age-group seroprevalences were weighted by Swiss population demographics to produce overall seroprevalence estimates, shown as probability densities with 95% credible intervals shaded and highlighted with dashed lines. (J) Age-group seroprevalences were used to estimate the effective reproductive number ( $R_{eff}$ ) from an age-stratified transmission model under *status quo ante* contact patterns, shown as probability densities with 95% credible intervals shaded and highlighted with dashed lines, based on a basic reproductive number in the absence of population immunity (R₀) of 2.5. Dashed lines indicate true values from which the data were sampled. Each distribution depicts inference outcomes from a single set of stochastically sampled data; no averaging is done. Note that although uniform and MDI sample allocation produces equivalently confident estimates of overall seroprevalence, MDI produces a more confident estimate of $R_{eff}$ because it allocates more samples to age groups most relevant to model dynamics.

**Figure 5—figure supplement 1.. Average credible interval width for overall seroprevalence estimates using four sampling strategies and four serological test kits.**
Credible intervals were calculated for data generated according to four sampling strategies (columns, colors) and four test kits (rows), with sensitivity and specificity values as indicated; see legends. Each point represents the average width of the intervals for the indicated overall seroprevalence value (see annotations on plots) at the specified number of serological samples n out of a total of 250 independent trials. Some seroprevalence values are plotted in black simply to guide the eye. The model and demographics informed strategy shown was designed to optimize estimation of $R_{eff}$ . Sampling strategies that resulted in posterior credible intervals with inaccurate coverage are crossed out to indicate that estimates in these regimes should be interpreted with caution.

**Figure 5—figure supplement 2.. Average credible interval width for R𝐞𝐟𝐟 estimates using four sampling strategies and four serological test kits.**
Credible intervals were calculated for data generated according to four sampling strategies (columns, colors) and four test kits (rows), with sensitivity and specificity values as indicated; see legends. Each point represents the average width of the intervals for the indicated overall seroprevalence value (see annotations on plots) at the specified number of serological samples n out of a total of 250 independent trials. Some seroprevalence values are plotted in black simply to guide the eye. The model and demographics informed strategy shown was designed to optimize estimation of $R_{eff}$ .

**Figure 5—figure supplement 3.. Credible interval coverage for overall seroprevalence estimates using four sampling strategies and four serological test kits.**
Credible interval coverage, defined as the fraction of posterior credible intervals that covered the true parameter used to generate the data, is shown for four sampling strategies (columns, colors) and four test kits (rows), with sensitivity and specificity values as indicated; see legends. Each point represents the fraction of credible intervals that covered the planted value for the indicated overall seroprevalence value (see annotations on plots) at the specified number of serological samples n out of a total of 250 independent trials. The estimated coverage from a perfectly calibrated posterior will have coverage fractions within 0.9 ± 0.37 (gray bands) 95% of the time. Some seroprevalence values are plotted in black simply to guide the eye. The model and demographics informed strategy shown was designed to optimize estimation of $R_{eff}$ .

**Figure 5—figure supplement 4.. Credible interval coverage for R𝐞𝐟𝐟 estimates using four sampling strategies and four serological test kits.**
Credible interval coverage, defined as the fraction of posterior credible intervals that covered the true parameter used to generate the data, is shown for four sampling strategies (columns, colors) and four test kits (rows), with sensitivity and specificity values as indicated; see legends. Each point represents the fraction of credible intervals that covered the planted value for the indicated overall seroprevalence value (see annotations on plots) at the specified number of serological samples n out of a total of 250 independent trials. The estimated coverage from a perfectly calibrated posterior will have coverage fractions within 0.9 ± 0.37 (gray bands) 95% of the time. Some seroprevalence values are plotted in black simply to guide the eye. The model and demographics informed strategy shown was designed to optimize estimation of $R_{eff}$ .

**Figure 6.. Model and demographics informed (MDI) sample allocations vary by demographics and modeling needs.**
Bar charts depict recommended sample allocation for three objectives, reducing posterior uncertainty for (**A, E**) estimates of overall seroprevalence, (**B, F**) predictions from an age-structured model with *status quo ante* contact patterns, (**C, G**) predictions from an age-structured model with modified contacts representing, relative to pre-crisis levels, a 20% increase in home contact rates, closed schools, a 25% decrease in work contacts, and a 50% decrease of other contacts (Mossong et al., 2008; Prem et al., 2017), and (**D, H**) averaging the other three MDI recommendations to balance competing objectives. Data for both the U.S. (blue; **A–D**) and India (orange; **E–H**) illustrate the impact of demography and contact structure on strategic sample allocation. These sample allocation strategies assume no prior knowledge of subpopulation seroprevalences ${θ_{i}}$ .

See this image and copyright information in PMC

References

1. Abrams S, Beutels P, Hens N. Assessing mumps outbreak risk in highly vaccinated populations using spatial seroprevalence data. American Journal of Epidemiology. 2014;179:1006–1017. doi: 10.1093/aje/kwu014. - DOI - PubMed
1. Ainslie KEC, Walters CE, Fu H, Bhatia S, Wang H, Xi X, Baguelin M, Bhatt S, Boonyasiri A, Boyd O, Cattarino L, Ciavarella C, Cucunuba Z, Cuomo-Dannenburg G, Dighe A, Dorigatti I, van Elsland SL, FitzJohn R, Gaythorpe K, Ghani AC, Green W, Hamlet A, Hinsley W, Imai N, Jorgensen D, Knock E, Laydon D, Nedjati-Gilani G, Okell LC, Siveroni I, Thompson HA, Unwin HJT, Verity R, Vollmer M, Walker PGT, Wang Y, Watson OJ, Whittaker C, Winskill P, Donnelly CA, Ferguson NM, Riley S. Evidence of initial success for China exiting COVID-19 social distancing policy after achieving containment. Wellcome Open Research. 2020;5:81. doi: 10.12688/wellcomeopenres.15843.2. - DOI - PMC - PubMed
1. Bendavid E, Mulaney B, Sood N, Shah S, Ling E, Bromley-Dulfano R, Lai C, Weissberg Z, Saavedra R, Tedrow J. COVID-19 antibody seroprevalence in santa clara county, California. medRxiv. 2020 doi: 10.1101/2020.04.14.20062463. - DOI - PMC - PubMed
1. Bubar KM, Reinholt K, Kissler SM, Lipsitch M, Cobey S, Grad YH, Larremore DB. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science. 2021;371:916–921. doi: 10.1126/science.abe6959. - DOI - PMC - PubMed
1. Buckee CO, Balsari S, Chan J, Crosas M, Dominici F, Gasser U, Grad YH, Grenfell B, Halloran ME, Kraemer MUG, Lipsitch M, Metcalf CJE, Meyers LA, Perkins TA, Santillana M, Scarpino SV, Viboud C, Wesolowski A, Schroeder A. Aggregated mobility data could help fight COVID-19. Science. 2020;368:145–146. doi: 10.1126/science.abb8021. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Affiliations

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous