Review

. 2020 Jul 20;39(16):2197-2231.

doi: 10.1002/sim.8532. Epub 2020 Apr 3.

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Ruth H Keogh¹, Pamela A Shaw², Paul Gustafson³, Raymond J Carroll^{4

5}, Veronika Deffner⁶, Kevin W Dodd⁷, Helmut Küchenhoff⁸, Janet A Tooze⁹, Michael P Wallace¹⁰, Victor Kipnis⁷, Laurence S Freedman^{11

12}

Affiliations

¹ Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
² Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
³ Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada.
⁴ Department of Statistics, Texas A&M University, College Station, Texas, USA.
⁵ School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia.
⁶ Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany.
⁷ Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA.
⁸ Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany.
⁹ Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.
¹⁰ Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
¹¹ Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel.
¹² Information Management Services Inc., Rockville, Maryland, USA.

PMID: 32246539
PMCID: PMC7450672
DOI: 10.1002/sim.8532

Review

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Ruth H Keogh et al. Stat Med. 2020.

. 2020 Jul 20;39(16):2197-2231.

doi: 10.1002/sim.8532. Epub 2020 Apr 3.

Authors

Affiliations

¹ Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
² Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
³ Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada.
⁴ Department of Statistics, Texas A&M University, College Station, Texas, USA.
⁵ School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia.
⁶ Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany.
⁷ Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA.
⁸ Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany.
⁹ Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.
¹⁰ Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
¹¹ Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel.
¹² Information Management Services Inc., Rockville, Maryland, USA.

PMID: 32246539
PMCID: PMC7450672
DOI: 10.1002/sim.8532

Abstract

Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.

Keywords: Berkson error; SIMEX; classical error; differential error; measurement error; misclassification; nondifferential error; regression calibration; sample size; simulation extrapolation.

Published 2020. This article is a U.S. Government work and is in the public domain in the USA.

PubMed Disclaimer

Figures

**Figure 1:**
Simulated data on 20 individuals showing the effects of classical error and Berkson error in the continuous covariate X on the fitted regression line. For both plots Y was generated from a normal distribution with mean β₀ + β_XX (using β₀ = 0, β_X = 1) and variance 1. *Classical error plot:* X was generated from a normal distribution with mean 0 and variance 1. X* was generated using X* = X + U. The difference in the slopes in this graph is due to attenuation from the measurement error in X. *Berkson error plot:X** was generated from a normal distribution with mean 0 and variance 1 and X was generated from the normal distribution implied by the Berkson error model X = X* + U. For both error types var(U) = 3. The small difference in the slopes in this graph is due entirely to sampling error.

**Figure 2:**
Effects of non-differential misclassification in binary X on the regression coefficient in a linear regression of continuous Y on X. β_X is the regression coefficient in a regression of Y on X and $β_{X}^{*}$ is the regression coefficient in a regression of Y on X* (misclassified X). The attenuation factor λ (equation (13)) is a function of Pr(X = 1) and the sensitivity (*Sn)* and specificity (*Sp)* of X*. The thick line is the line $β_{X}^{*} = β_{X}$ .

**Figure 3:**
Simulated data on 20 individuals showing the effects of classical error and Berkson error in continuous Y on the fitted regression line. For both plots X was generated from a normal distribution with mean 0, variance 1 and the errors U were generated from a normal distribution with mean 0 and variance 3. *Classical error plot: Y* was generated with mean X and variance 1. Y* was generated using Y* = Y + U. The difference in slopes is due entirely to sampling error. *Berkson error plot: Y** was generated with mean X and variance 1. The difference in slopes is due to attenuation from the measurement error in Y. Y given X was generated from the normal distribution implied by the model for Y* and the Berkson error model Y = Y* + U.

**Figure 4:**
Effects of non-differential and differential misclassification in Y on the log odds ratio. $β_{X}^{*}$ is the log odds ratio of Y* given X (equation (20)) and β_X is the log odds ratio for Y given X (equation (19)). The covariate X is binary (for simplicity) and we assume β₀ = 0 in equation (19). Sn and Sp denote the non-differential sensitivity and specificity for Y*, and *Sn(X)* and *Sp(X)* denote the differential versions for X = 0,1. The thick line is the line $β_{X}^{*} = β_{X}$ .

**Figure 5:**
SIMEX: Relationship between the measurement error variance var(U) and the regression coefficient $β_{X}^{*}$ . β_X is the regression coefficient in a regression of Y on X and $β_{X}^{*}$ is the regression coefficient in a regression of Y on X*. X* is assumed to follow the classical error model X* = X + U, and $β_{X}^{*} = \frac{v a r (X)}{v a r (X) + v a r (U)} β_{X}$ . Here, var(X) = 1 and β_X = 1.

**Figure 6:**
SIMEX estimation for the association between individual heart rate as outcome variable and log-transformed individual particle number concentration (a measure of air pollution exposure) measured in number per cm³. The SIMEX estimator is assessed assuming a measurement error variance of 0.03, which is determined through comparison measurements, a quadratic extrapolation function, number of simulations B = 100 and s = (1,1.5,2,2.5,3). The analysis is based on longitudinal data of an observational study described in Peters et al (2015). The model accounts for temperature, relative humidity, time trend and time of the day. The solid curve is obtained from the fit of the extrapolation model to the pseudo-datasets, and the dotted line represents the extrapolated part.

See this image and copyright information in PMC

Cited by

Dietary Factors Associated with Asthma Development: A Narrative Review and Summary of Current Guidelines and Recommendations.
Takkinsatian P, Mairiang D, Sangkanjanavanich S, Chiewchalermsri C, Tripipitsiriwat A, Sompornrattanaphan M. Takkinsatian P, et al. J Asthma Allergy. 2022 Aug 24;15:1125-1141. doi: 10.2147/JAA.S364964. eCollection 2022. J Asthma Allergy. 2022. PMID: 36046721 Free PMC article. Review.
Estimating the Effect of Adhering to the Recommendations of the 2019 Canada's Food Guide on Health Outcomes in Older Adults: Protocol for a Target Trial Emulation.
Brassard D, Presse N, Chevalier S. Brassard D, et al. JMIR Res Protoc. 2025 Jan 23;14:e65182. doi: 10.2196/65182. JMIR Res Protoc. 2025. PMID: 39847422 Free PMC article.
Should regression calibration or multiple imputation be used when calibrating different devices in a longitudinal study?
Loop MS, Lotspeich SC, Garcia TP, Meyer ML. Loop MS, et al. Am J Epidemiol. 2025 Jan 8;194(1):295-301. doi: 10.1093/aje/kwae169. Am J Epidemiol. 2025. PMID: 38957970 Free PMC article.
Survey of practices of handling exposure measurement errors in modern epidemiology: are the best practices in statistics being adopted by epidemiologists?
Russell AJ, Hunter MK, Maldonado G, Burstyn I. Russell AJ, et al. BMC Med Res Methodol. 2025 Aug 25;25(1):198. doi: 10.1186/s12874-025-02651-w. BMC Med Res Methodol. 2025. PMID: 40855410 Free PMC article.
Split and combine simulation extrapolation algorithm to correct geocoding coarsening of built environment exposures.
Won JY, Sanchez-Vaznaugh EV, Zhai Y, Sánchez BN. Won JY, et al. Stat Med. 2022 May 20;41(11):1932-1949. doi: 10.1002/sim.9338. Epub 2022 Jan 31. Stat Med. 2022. PMID: 35098584 Free PMC article.

See all "Cited by" articles

References

1. Murray RP, Connett JE, Lauger GG, Voelker HT. Error in smoking measures: effects of intervention on relations of cotinine and carbon monoxide to self-reported smoking. The Lung Health Study Research Group. Am J Public Health. 1993;83(9):1251–1257. doi:10.2105/ajph.83.9.1251 - DOI - PMC - PubMed
1. Thiébaut ACM, Freedman LS, Carroll RJ, Kipnis V. Is It Necessary to Correct for Measurement Error in Nutritional Epidemiology? Ann Intern Med. 2007;146(1):65. doi:10.7326/0003-4819-146-1-200701020-00012 - DOI - PubMed
1. Ferrari P, Friedenreich C, Matthews CE. The Role of Measurement Error in Estimating Levels of Physical Activity. Am J Epidemiol. 2007;166(7):832–840. doi:10.1093/aje/kwm148 - DOI - PubMed
1. Zeger SL, Thomas D, Dominici F, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000;108(5):419–426. doi:10.1289/ehp.00108419 - DOI - PMC - PubMed
1. Shaw PA, Deffner V, Keogh RH, et al. Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. Ann Epidemiol. 2018;28(11):821–828. doi:10.1016/j.annepidem.2018.09.001 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R37 AI131771/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Affiliations

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources