Modeling and variable selection in epidemiologic analysis
- PMID: 2916724
- PMCID: PMC1349563
- DOI: 10.2105/ajph.79.3.340
Modeling and variable selection in epidemiologic analysis
Abstract
This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.
Similar articles
-
Imputation and variable selection in linear regression models with missing covariates.Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x. Biometrics. 2005. PMID: 16011697
-
Invited commentary: variable selection versus shrinkage in the control of multiple confounders.Am J Epidemiol. 2008 Mar 1;167(5):523-9; discussion 530-1. doi: 10.1093/aje/kwm355. Epub 2008 Jan 27. Am J Epidemiol. 2008. PMID: 18227100
-
Variable selection for clustering with Gaussian mixture models.Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4. Biometrics. 2009. PMID: 19210744
-
[Logistic regression: a useful tool in rehabilitation research].Rehabilitation (Stuttg). 2008 Feb;47(1):56-62. doi: 10.1055/s-2007-992790. Rehabilitation (Stuttg). 2008. PMID: 18247272 Review. German.
-
RETR_PWR: an SAS macro for retrospective statistical power analysis.Behav Res Methods Instrum Comput. 2003 Nov;35(4):585-9. doi: 10.3758/bf03195537. Behav Res Methods Instrum Comput. 2003. PMID: 14748502 Review.
Cited by
-
Long-Term Ambient Residential Traffic-Related Exposures and Measurement Error-Adjusted Risk of Incident Lung Cancer in the Netherlands Cohort Study on Diet and Cancer.Environ Health Perspect. 2015 Sep;123(9):860-6. doi: 10.1289/ehp.1408762. Epub 2015 Mar 27. Environ Health Perspect. 2015. PMID: 25816363 Free PMC article.
-
Effectiveness of COVID-19 vaccine booster in the general population and in subjects with comorbidities. A population-based study in Spain.Environ Res. 2022 Dec;215(Pt 2):114252. doi: 10.1016/j.envres.2022.114252. Epub 2022 Sep 10. Environ Res. 2022. PMID: 36096168 Free PMC article.
-
Comparison of Longitudinal and Cross-Sectional Approaches in Studies About Knowledge, Attitude, and Practices Related to Antibiotic Misuse.Drug Saf. 2021 Jul;44(7):797-809. doi: 10.1007/s40264-021-01075-x. Epub 2021 May 10. Drug Saf. 2021. PMID: 33970447
-
Racial/ethnic differences in anthropometric and hormone-related factors and endometrial cancer risk: the Multiethnic Cohort Study.Br J Cancer. 2021 May;124(10):1724-1733. doi: 10.1038/s41416-021-01292-2. Epub 2021 Mar 15. Br J Cancer. 2021. PMID: 33723396 Free PMC article.
-
Associations between prenatal physical activity, birth weight, and DNA methylation at genomically imprinted domains in a multiethnic newborn cohort.Epigenetics. 2015;10(7):597-606. doi: 10.1080/15592294.2015.1045181. Epub 2015 Apr 30. Epigenetics. 2015. PMID: 25928716 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials