Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;25(4):122-130.
doi: 10.21315/mjms2018.25.4.12. Epub 2018 Aug 30.

Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data

Affiliations

Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data

Mohamad Adam Bujang et al. Malays J Med Sci. 2018 Jul.

Abstract

Background: Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population.

Methods: We estimated the minimum sample size required based on evaluation from real clinical data to evaluate the accuracy between statistics derived and the actual parameters. Nagelkerke r-squared and coefficients derived were compared with their respective parameters.

Results: With a minimum sample size of 500, results showed that the differences between the sample estimates and the population was sufficiently small. Based on an audit from a medium size of population, the differences were within ± 0.5 for coefficients and ± 0.02 for Nagelkerke r-squared. Meanwhile for large population, the differences are within ± 1.0 for coefficients and ± 0.02 for Nagelkerke r-squared.

Conclusions: For observational studies with large population size that involve logistic regression in the analysis, taking a minimum sample size of 500 is necessary to derive the statistics that represent the parameters. The other recommended rules of thumb are EPV of 50 and formula; n = 100 + 50i where i refers to number of independent variables in the final model.

Keywords: logistic regression; observational studies; sample size.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest All authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The comparison of differences of coefficients between results derived from parameters and statistics based on various sample sizes
Figure 2
Figure 2
The comparison of differences of Nagelkerke r-squared between results derived from parameters and statistics based on various sample sizes
Figure 3
Figure 3
The comparison of differences of coefficients between results derived from parameters and statistics based on various sample sizes tested with larger sample

References

    1. Chew BH, Shariff-Ghazali S, Mastura I, Haniff J, Bujang MA. Age ≥ 60 years was an independent risk factor for diabetes-related complications despite good control of cardiovascular risk factors in patients with type 2 diabetes mellitus. Exp Gerontol. 2013;48(5):485–491. doi: 10.1016/j.exger.2013.02.017. - DOI - PubMed
    1. Chew BH, Mastura I, Shariff-Ghazali S, Lee PY, Cheong AT, Ahmad Z, et al. Determinants of uncontrolled hypertension in adult type 2 diabetes mellitus: an analysis of the Malaysian diabetes registry 2009. Cardiovasc Diabetol. 2012;11:54. doi: 10.1186/1475-2840-11-54. - DOI - PMC - PubMed
    1. Lee PY, Cheong AT, Zaiton A, et al. Does ethnicity contribute to the control of cardiovascular risk factors among patients with type 2 diabetes? Asia Pac J Public Health. 2013;25(4):316–325. doi: 10.1177/1010539511430521. - DOI - PubMed
    1. Premsenthil M, Salowi MA, Bujang MA, Kueh A, Siew CM, Sumugam K, et al. Risk factors and prediction models for retinopathy of prematurity. Malays J Med Sci. 2015;22(5):57–63. - PMC - PubMed
    1. Hsieh FY. Sample size tables for logistic regression. Stat Med. 1989;8(7):795–802. doi: 10.1002/sim.4780080704. - DOI - PubMed

LinkOut - more resources