Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jul 20;39(16):2197-2231.
doi: 10.1002/sim.8532. Epub 2020 Apr 3.

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Affiliations
Review

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment

Ruth H Keogh et al. Stat Med. .

Abstract

Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.

Keywords: Berkson error; SIMEX; classical error; differential error; measurement error; misclassification; nondifferential error; regression calibration; sample size; simulation extrapolation.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Simulated data on 20 individuals showing the effects of classical error and Berkson error in the continuous covariate X on the fitted regression line. For both plots Y was generated from a normal distribution with mean β0 + βXX (using β0 = 0, βX = 1) and variance 1. Classical error plot: X was generated from a normal distribution with mean 0 and variance 1. X* was generated using X* = X + U. The difference in the slopes in this graph is due to attenuation from the measurement error in X. Berkson error plot:X* was generated from a normal distribution with mean 0 and variance 1 and X was generated from the normal distribution implied by the Berkson error model X = X* + U. For both error types var(U) = 3. The small difference in the slopes in this graph is due entirely to sampling error.
Figure 2:
Figure 2:
Effects of non-differential misclassification in binary X on the regression coefficient in a linear regression of continuous Y on X. βX is the regression coefficient in a regression of Y on X and βX* is the regression coefficient in a regression of Y on X* (misclassified X). The attenuation factor λ (equation (13)) is a function of Pr(X = 1) and the sensitivity (Sn) and specificity (Sp) of X*. The thick line is the line βX*=βX.
Figure 3:
Figure 3:
Simulated data on 20 individuals showing the effects of classical error and Berkson error in continuous Y on the fitted regression line. For both plots X was generated from a normal distribution with mean 0, variance 1 and the errors U were generated from a normal distribution with mean 0 and variance 3. Classical error plot: Y was generated with mean X and variance 1. Y* was generated using Y* = Y + U. The difference in slopes is due entirely to sampling error. Berkson error plot: Y* was generated with mean X and variance 1. The difference in slopes is due to attenuation from the measurement error in Y. Y given X was generated from the normal distribution implied by the model for Y* and the Berkson error model Y = Y* + U.
Figure 4:
Figure 4:
Effects of non-differential and differential misclassification in Y on the log odds ratio. βX* is the log odds ratio of Y* given X (equation (20)) and βX is the log odds ratio for Y given X (equation (19)). The covariate X is binary (for simplicity) and we assume β0 = 0 in equation (19). Sn and Sp denote the non-differential sensitivity and specificity for Y*, and Sn(X) and Sp(X) denote the differential versions for X = 0,1. The thick line is the line βX*=βX.
Figure 5:
Figure 5:
SIMEX: Relationship between the measurement error variance var(U) and the regression coefficient βX*. βX is the regression coefficient in a regression of Y on X and βX* is the regression coefficient in a regression of Y on X*. X* is assumed to follow the classical error model X* = X + U, and βX*=var(X)var(X)+var(U)βX. Here, var(X) = 1 and βX = 1.
Figure 6:
Figure 6:
SIMEX estimation for the association between individual heart rate as outcome variable and log-transformed individual particle number concentration (a measure of air pollution exposure) measured in number per cm3. The SIMEX estimator is assessed assuming a measurement error variance of 0.03, which is determined through comparison measurements, a quadratic extrapolation function, number of simulations B = 100 and s = (1,1.5,2,2.5,3). The analysis is based on longitudinal data of an observational study described in Peters et al (2015). The model accounts for temperature, relative humidity, time trend and time of the day. The solid curve is obtained from the fit of the extrapolation model to the pseudo-datasets, and the dotted line represents the extrapolated part.

Similar articles

Cited by

References

    1. Murray RP, Connett JE, Lauger GG, Voelker HT. Error in smoking measures: effects of intervention on relations of cotinine and carbon monoxide to self-reported smoking. The Lung Health Study Research Group. Am J Public Health. 1993;83(9):1251–1257. doi:10.2105/ajph.83.9.1251 - DOI - PMC - PubMed
    1. Thiébaut ACM, Freedman LS, Carroll RJ, Kipnis V. Is It Necessary to Correct for Measurement Error in Nutritional Epidemiology? Ann Intern Med. 2007;146(1):65. doi:10.7326/0003-4819-146-1-200701020-00012 - DOI - PubMed
    1. Ferrari P, Friedenreich C, Matthews CE. The Role of Measurement Error in Estimating Levels of Physical Activity. Am J Epidemiol. 2007;166(7):832–840. doi:10.1093/aje/kwm148 - DOI - PubMed
    1. Zeger SL, Thomas D, Dominici F, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000;108(5):419–426. doi:10.1289/ehp.00108419 - DOI - PMC - PubMed
    1. Shaw PA, Deffner V, Keogh RH, et al. Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. Ann Epidemiol. 2018;28(11):821–828. doi:10.1016/j.annepidem.2018.09.001 - DOI - PMC - PubMed

Publication types

LinkOut - more resources