A solution to minimum sample size for regressions

doi:10.1371/journal.pone.0229345

. 2020 Feb 21;15(2):e0229345.

doi: 10.1371/journal.pone.0229345. eCollection 2020.

A solution to minimum sample size for regressions

David G Jenkins¹, Pedro F Quintana-Ascencio¹

Affiliations

PMID: 32084211
PMCID: PMC7034864
DOI: 10.1371/journal.pone.0229345

A solution to minimum sample size for regressions

David G Jenkins et al. PLoS One. 2020.

. 2020 Feb 21;15(2):e0229345.

doi: 10.1371/journal.pone.0229345. eCollection 2020.

Authors

David G Jenkins¹, Pedro F Quintana-Ascencio¹

Affiliation

¹ Department of Biology, University of Central Florida, Orlando, Florida, United States of America.

PMID: 32084211
PMCID: PMC7034864
DOI: 10.1371/journal.pone.0229345

Abstract

Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data pattern using regressions? Statistical power analysis is often used to answer that question, but it has its own problems and logically should follow model selection to first identify the most plausible model. Here we make null, simple linear and quadratic data with different variances and effect sizes. We then sample and use information theoretic model selection to evaluate minimum N for regression models. We also evaluate the use of coefficient of determination (R2) for this purpose; it is widely used but not recommended. With very low variance, both false positives and false negatives occurred at N < 8, but data shape was always clearly identified at N ≥ 8. With high variance, accurate inference was stable at N ≥ 25. Those outcomes were consistent at different effect sizes. Akaike Information Criterion weights (AICc wi) were essential to clearly identify patterns (e.g., simple linear vs. null); R2 or adjusted R2 values were not useful. We conclude that a minimum N = 8 is informative given very little variance, but minimum N ≥ 25 is required for more variance. Alternative models are better compared using information theory indices such as AIC but not R2 or adjusted R2. Insufficient N and R2-based model selection apparently contribute to confusion and low reproducibility in various disciplines. To avoid those problems, we recommend that research based on regressions or meta-regressions use N ≥ 25.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Histograms of N in research.**
(a) economic meta-analyses & meta-regressions; (b) medical / epidemiological meta-analyses & meta-regressions; (c) ecological analyses of disturbance [8]; and (d) biogeographical analyses of species-area relationships [9]. Please see S1 Appendix for a description of literature search methods, data, and references for (a) and (b).

**Fig 2. Data made with a null model (1^st column) and results of analyses using null (2^nd column), straight-line (3^rd column) and quadratic (4^th column) models.**
Data with (a) high variance and (b) low variance were each analyzed at N = 4–50. Results are presented with maximum N = 30 for visual clarity; all results stabilized at N > 30. Circles are means; error bars are 95% confidence intervals. “Traffic signal” colors on sample size (N) axes for the null model indicate ranges where N is too small (red = stop), or sufficient (green = go) to correctly infer the pattern. Note the quadratic model outcomes at N = 4 (red circles).

**Fig 3. Data made with a straight-line model (1^st column) and results of analyses using null (2^nd column), straight-line (3^rd column) and quadratic (4^th column) models.**
The four combinations (a-d) of low/high variance (σ) and effect size (α) represent approximate graphical extremes. Grey lines represent transitions in leading w_i between two models. “Traffic signal” colors on sample size (N) axes for the straight-line model indicate ranges where N is too small (red = stop), about equivalent to the null (yellow = caution), or sufficient (green = go) to correctly infer the pattern. Note the null and quadratic model outcomes at low N (red circles or ellipses).

**Fig 4. Data made with a quadratic model and with high variance (σ; 1^st column) and results of analyses using null (2^nd column), straight-line (3^rd column) and quadratic (4^th column) models.**
All else as in Figs 2 & 3.

**Fig 5. Data made with a quadratic model and with low variance (σ; 1^st column) and results of analyses using null (2^nd column), straight-line (3^rd column) and quadratic (4^th column) models.**
All else as in Figs 2–4.

See this image and copyright information in PMC

Cited by

Validation of a Modified Submaximal Balke Protocol to Assess Cardiorespiratory Fitness in Individuals at High Risk of or With Chronic Health Conditions-A Pilot Study.
Eike GSH, Aadland E, Blom EE, Riiser A. Eike GSH, et al. Front Sports Act Living. 2021 Apr 22;3:642538. doi: 10.3389/fspor.2021.642538. eCollection 2021. Front Sports Act Living. 2021. PMID: 33969294 Free PMC article.
Herbaceous perennial plants with short generation time have stronger responses to climate anomalies than those with longer generation time.
Compagnoni A, Levin S, Childs DZ, Harpole S, Paniw M, Römer G, Burns JH, Che-Castaldo J, Rüger N, Kunstler G, Bennett JM, Archer CR, Jones OR, Salguero-Gómez R, Knight TM. Compagnoni A, et al. Nat Commun. 2021 Mar 23;12(1):1824. doi: 10.1038/s41467-021-21977-9. Nat Commun. 2021. PMID: 33758189 Free PMC article.
Etiology and duration of the disease in the assessment of intellectual functioning of pediatric patients with epilepsy: An observational study.
Oldrati V, Minghetti S, Zanotta N, Bardoni A, Zucca C. Oldrati V, et al. Heliyon. 2023 Feb 25;9(3):e14085. doi: 10.1016/j.heliyon.2023.e14085. eCollection 2023 Mar. Heliyon. 2023. PMID: 36915569 Free PMC article.
Novel Th17 Lymphocyte Populations, Th17.1 and PD1+Th17, are Increased in Takayasu Arteritis, and Both Th17 and Th17.1 Sub-Populations Associate with Active Disease.
Singh K, Rathore U, Rai MK, Behera MR, Jain N, Ora M, Bhadauria D, Sharma S, Pande G, Gambhir S, Nath A, Kumar S, Sharma A, Agarwal V, Misra DP. Singh K, et al. J Inflamm Res. 2022 Mar 1;15:1521-1541. doi: 10.2147/JIR.S355881. eCollection 2022. J Inflamm Res. 2022. PMID: 35256852 Free PMC article.
Congruence in European and Asian perception of Vietnamese facial attractiveness, averageness, symmetry and sexual dimorphism.
Pavlovič O, Fiala V, Kleisner K. Pavlovič O, et al. Sci Rep. 2023 Aug 16;13(1):13320. doi: 10.1038/s41598-023-40458-1. Sci Rep. 2023. PMID: 37587194 Free PMC article.

See all "Cited by" articles

References

1. Lau J, Ioannidis JP and Schmid CH. Quantitative synthesis in systematic reviews. Annals Internal Med. 1997; 127: 820–826. - PubMed
1. Baker WL, White C, Cappelleri JC, Kluger J, Coleman CI, from the Health Outcomes, Policy, and Economics (HOPE) Collaborative Group. Understanding heterogeneity in meta-analysis: the role of meta-regression. International Journal of Clinical Practice. 2009; 63: 1426–1434. 10.1111/j.1742-1241.2009.02168.x - DOI - PubMed
1. Koricheva J, Gurevitch J, Mengersen K. editors. Handbook of meta-analysis in ecology and evolution. Princeton University Press. 2013.
1. Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018; 555: p.175–182. 10.1038/nature25753 - DOI - PubMed
1. Stanley TD, Jarrell SB. Meta-regression analysis: a quantitative method of literature surveys. J. Economic Surveys. 2005; 19: 299–308.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

[1] Lau J, Ioannidis JP and Schmid CH. Quantitative synthesis in systematic reviews. Annals Internal Med. 1997; 127: 820–826. - PubMed

[2] Lau J, Ioannidis JP and Schmid CH. Quantitative synthesis in systematic reviews. Annals Internal Med. 1997; 127: 820–826. - PubMed

[3] Baker WL, White C, Cappelleri JC, Kluger J, Coleman CI, from the Health Outcomes, Policy, and Economics (HOPE) Collaborative Group. Understanding heterogeneity in meta-analysis: the role of meta-regression. International Journal of Clinical Practice. 2009; 63: 1426–1434. 10.1111/j.1742-1241.2009.02168.x - DOI - PubMed

[4] Baker WL, White C, Cappelleri JC, Kluger J, Coleman CI, from the Health Outcomes, Policy, and Economics (HOPE) Collaborative Group. Understanding heterogeneity in meta-analysis: the role of meta-regression. International Journal of Clinical Practice. 2009; 63: 1426–1434. 10.1111/j.1742-1241.2009.02168.x - DOI - PubMed

[5] Koricheva J, Gurevitch J, Mengersen K. editors. Handbook of meta-analysis in ecology and evolution. Princeton University Press. 2013.

[6] Koricheva J, Gurevitch J, Mengersen K. editors. Handbook of meta-analysis in ecology and evolution. Princeton University Press. 2013.

[7] Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018; 555: p.175–182. 10.1038/nature25753 - DOI - PubMed

[8] Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018; 555: p.175–182. 10.1038/nature25753 - DOI - PubMed

[9] Stanley TD, Jarrell SB. Meta-regression analysis: a quantitative method of literature surveys. J. Economic Surveys. 2005; 19: 299–308.

[10] Stanley TD, Jarrell SB. Meta-regression analysis: a quantitative method of literature surveys. J. Economic Surveys. 2005; 19: 299–308.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A solution to minimum sample size for regressions

Affiliation

A solution to minimum sample size for regressions

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources