A two-stage strategy to accommodate general patterns of confounding in the design of observational studies

Sebastien Haneuse¹, Jonathan Schildcrout, Daniel Gillen

Affiliations

PMID: 22130627
PMCID: PMC3297823
DOI: 10.1093/biostatistics/kxr044

A two-stage strategy to accommodate general patterns of confounding in the design of observational studies

Sebastien Haneuse et al. Biostatistics. 2012 Apr.

. 2012 Apr;13(2):274-88.

doi: 10.1093/biostatistics/kxr044. Epub 2011 Nov 30.

Authors

Sebastien Haneuse¹, Jonathan Schildcrout, Daniel Gillen

Affiliation

¹ Department of Biostatistics, Harvard School of Public Health, Boston, MA 02116, USA. shaneuse@hsph.harvard.edu

PMID: 22130627
PMCID: PMC3297823
DOI: 10.1093/biostatistics/kxr044

Abstract

Accommodating general patterns of confounding in sample size/power calculations for observational studies is extremely challenging, both technically and scientifically. While employing previously implemented sample size/power tools is appealing, they typically ignore important aspects of the design/data structure. In this paper, we show that sample size/power calculations that ignore confounding can be much more unreliable than is conventionally thought; using real data from the US state of North Carolina, naive calculations yield sample size estimates that are half those obtained when confounding is appropriately acknowledged. Unfortunately, eliciting realistic design parameters for confounding mechanisms is difficult. To overcome this, we propose a novel two-stage strategy for observational study design that can accommodate arbitrary patterns of confounding. At the first stage, researchers establish bounds for power that facilitate the decision of whether or not to initiate the study. At the second stage, internal pilot data are used to estimate key scientific inputs that can be used to obtain realistic sample size/power. Our results indicate that the strategy is effective at replicating gold standard calculations based on knowing the true confounding mechanism. Finally, we show that consideration of the nature of confounding is a crucial aspect of the elicitation process; depending on whether the confounder is positively or negatively associated with the exposure of interest and outcome, naive power calculations can either under or overestimate the required sample size. Throughout, simulation is advocated as the only general means to obtain realistic estimates of statistical power; we describe, and provide in an R package, a simple algorithm for estimating power for a case-control study.

PubMed Disclaimer

Figures

**Fig. 1.**
Estimated power curves for detecting θ_x = 1.3 under a balanced case–control study, as a function of the case–control sample size n = n₀ + n₁. Each curve corresponds to a model that forms the basis for the power calculation (Sections 2.2 and 3.1). Estimates were obtained using the algorithm in the supplementary material (available at *Biostatistics* online) with R = 10000.

**Fig. 2.**
Estimated bounds for power to detect θ_x = 1.3, based on a case–control design, as a function of case–control sample size n for various scenarios for confounding. Estimates were obtained using the algorithm in the supplementary material (available at *Biostatistics* online) with R = 10000.

**Fig. 3.**
Results from four independent realizations of stage II, with pilot data sample sizes of m = 250, m = 500, and m = 1000. In each subfigure, power curves based on complete data (CD) for the unadjusted and fully adjusted models. Estimates were obtained using the algorithm in the supplementary material (available at *Biostatistics* online) with R = 10000.

See this image and copyright information in PMC

Cited by

Practical strategies for operationalizing optimal allocation in stratified cluster-based outcome-dependent sampling designs.
Sauer S, Hedt-Gauthier B, Haneuse S. Sauer S, et al. Stat Med. 2023 Mar 30;42(7):917-935. doi: 10.1002/sim.9650. Epub 2023 Jan 17. Stat Med. 2023. PMID: 36650619 Free PMC article.
A two-stage hidden Markov model design for biomarker detection, with application to microbiome research.
Zhou YH, Brooks P, Wang X. Zhou YH, et al. Stat Biosci. 2018 Apr;10(1):41-58. doi: 10.1007/s12561-017-9187-y. Epub 2017 Feb 10. Stat Biosci. 2018. PMID: 30174757 Free PMC article.
Power and sample size for multivariate logistic modeling of unmatched case-control studies.
Gail MH, Haneuse S. Gail MH, et al. Stat Methods Med Res. 2019 Mar;28(3):822-834. doi: 10.1177/0962280217737157. Epub 2017 Nov 16. Stat Methods Med Res. 2019. PMID: 29145780 Free PMC article.
Sample size and power determination when limited preliminary information is available.
McLaren CE, Chen WP, O'Sullivan TD, Gillen DL, Su MY, Chen JH, Tromberg BJ. McLaren CE, et al. BMC Med Res Methodol. 2017 Apr 26;17(1):75. doi: 10.1186/s12874-017-0329-1. BMC Med Res Methodol. 2017. PMID: 28446127 Free PMC article.
Optimal allocation in stratified cluster-based outcome-dependent sampling designs.
Sauer S, Hedt-Gauthier B, Haneuse S. Sauer S, et al. Stat Med. 2021 Aug 15;40(18):4090-4107. doi: 10.1002/sim.9016. Epub 2021 Jun 2. Stat Med. 2021. PMID: 34076912 Free PMC article.

See all "Cited by" articles

References

1. Berry D. Interim analysis in clinical trials: the role of the likelihood principle. The American Statistician. 1987;41:117–122.
1. Breslow N, Chatterjee N. Design and analysis of two-phase studies with binary outcomes applied to Wilms' tumor prognosis. Applied Statistics. 1999;48:457–468.
1. Breslow N, Day N. Statistical Methods in Cancer Research, Vol. 1: The Analysis of Case-Control Studies. Lyon, France: IARC Scientific Publications; 1980. - PubMed
1. Burington B, Emerson S. Flexible implementations of group sequential stopping rules using constrained boundaries. Biometrics. 2003;59:770–777. - PubMed
1. Demidenko D. Sample size determination for logistic regression revisited. Statistics in Medicine. 2006;26:3385–3397. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A two-stage strategy to accommodate general patterns of confounding in the design of observational studies

Affiliation

A two-stage strategy to accommodate general patterns of confounding in the design of observational studies

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases