Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 May 27;20(1):132.
doi: 10.1186/s12874-020-00994-0.

A review of the use of propensity score diagnostics in papers published in high-ranking medical journals

Affiliations
Review

A review of the use of propensity score diagnostics in papers published in high-ranking medical journals

Emily Granger et al. BMC Med Res Methodol. .

Abstract

Background: Propensity scores are widely used to deal with confounding bias in medical research. An incorrectly specified propensity score model may lead to residual confounding bias; therefore it is essential to use diagnostics to assess propensity scores in a propensity score analysis. The current use of propensity score diagnostics in the medical literature is unknown. The objectives of this study are to (1) assess the use of propensity score diagnostics in medical studies published in high-ranking journals, and (2) assess whether the use of propensity score diagnostics differs between studies (a) in different research areas and (b) using different propensity score methods.

Methods: A PubMed search identified studies published in high-impact journals between Jan 1st 2014 and Dec 31st 2016 using propensity scores to answer an applied medical question. From each study we extracted information regarding how propensity scores were assessed and which propensity score method was used. Research area was defined using the journal categories from the Journal Citations Report.

Results: A total of 894 papers were included in the review. Of these, 187 (20.9%) failed to report whether the propensity score had been assessed. Commonly reported diagnostics were p-values from hypothesis tests (36.6%) and the standardised mean difference (34.6%). Statistical tests provided marginally stronger evidence for a difference in diagnostic use between studies in different research areas (p = 0.033) than studies using different propensity score methods (p = 0.061).

Conclusions: The use of diagnostics in the propensity score medical literature is far from optimal, with different diagnostics preferred in different areas of medicine. The propensity score literature may improve with focused efforts to change practice in areas where suboptimal practice is most common.

Keywords: Confounding; Covariate balance; Diagnostics; Epidemiology; Propensity scores.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of study selection
Fig. 2
Fig. 2
The proportion (95% Confidence Interval) of studies which did not report use of a propensity score diagnostic, by research area. ‘Assessment not reported’ refers to papers which did not specify whether propensity scores were assessed; ‘Diagnostic not reported’ refers to papers which reported that assessment took place, but not how
Fig. 3
Fig. 3
The proportion (95% Confidence Interval) of studies using each diagnostic, by research area. ‘Other’ includes: absolute differences, graphical approaches, post-matching c-statistic, regression, standardised bias, and variance ratios
Fig. 4
Fig. 4
The proportion (95% Confidence Interval) of studies using each diagnostic, by propensity score method. Assessment not reported’ refers to papers which did not specify whether propensity scores were assessed; ‘Diagnostic not reported’ refers to papers which reported that assessment took place, but not how; ‘Other’ includes: absolute differences, graphical approaches, post-matching c-statistic, regression, standardised bias, and variance ratios

References

    1. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379. - PubMed
    1. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
    1. Garrido MM, Kelley AS, Paris J, et al. Methods for constructing and assessing propensity scores. Health Serv Res. 2014;49:1701–1720. - PMC - PubMed
    1. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79:516–524.
    1. Austin PC. A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate Behav Res. 2011;46:119–151. - PMC - PubMed

Publication types