Problems due to small samples and sparse data in conditional logistic regression analysis
- PMID: 10707923
- DOI: 10.1093/oxfordjournals.aje.a010240
Problems due to small samples and sparse data in conditional logistic regression analysis
Abstract
Conditional logistic regression was developed to avoid "sparse-data" biases that can arise in ordinary logistic regression analysis. Nonetheless, it is a large-sample method that can exhibit considerable bias when certain types of matched sets are infrequent or when the model contains too many parameters. Sparse-data bias can cause misleading inferences about confounding, effect modification, dose response, and induction periods, and can interact with other biases. In this paper, the authors describe these problems in the context of matched case-control analysis and provide examples from a study of electrical wiring and childhood leukemia and a study of diet and glioma. The same problems can arise in any likelihood-based analysis, including ordinary logistic regression. The problems can be detected by careful inspection of data and by examining the sensitivity of estimates to category boundaries, variables in the model, and transformations of those variables. One can also apply various bias corrections or turn to methods less sensitive to sparse data than conditional likelihood, such as Bayesian and empirical-Bayes (hierarchical regression) methods.
Comment in
-
Re: "Problems due to small samples and sparse data in conditional logistic regression analysis".Am J Epidemiol. 2000 Oct 1;152(7):688-9. doi: 10.1093/aje/152.7.688. Am J Epidemiol. 2000. PMID: 11032165 No abstract available.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
