Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan-Feb;45(1):28-31.
doi: 10.1097/JCP.0000000000001936. Epub 2024 Nov 21.

Quality Assurance of Depression Ratings in Psychiatric Clinical Trials

Affiliations

Quality Assurance of Depression Ratings in Psychiatric Clinical Trials

Michael T Sapko et al. J Clin Psychopharmacol. 2025 Jan-Feb.

Abstract

Background: Extensive experience with antidepressant clinical trials indicates that interrater reliability (IRR) must be maintained to achieve reliable clinical trial results. Contract research organizations have generally accepted 6 points of rating disparity between study site raters and central "master raters" as concordant, in part because of the personnel turnover and variability within many contract research organizations. We developed and tested an "insourced" model using a small, dedicated team of rater program managers (RPMs), to determine whether 3 points of disparity could successfully be demonstrated as a feasible standard for rating concordance.

Methods: Site raters recorded and scored all Montgomery-Åsberg Depression Rating Scale (MADRS) interviews. Audio files were independently reviewed and scored by RPMs within 24 to 48 hours. Concordance was defined as the absolute difference in MADRS total score of 3 points or less. A MADRS total score that differed by 4 or more points triggered a discussion with the site rater and additional training, as needed.

Results: In a sample of 236 ratings (58 patients), IRR between site ratings and blinded independent RPM ratings was 94.49% (223/236). The lowest concordance, 87.93%, occurred at visit 2, which was the baseline visit in the clinical trial. Concordance rates at visits 3, 4, 5, and 6 were 93.75%, 96.08%, 97.30%, and 100.00%, respectively. The absolute mean difference in MADRS rating pairs was 1.77 points (95% confidence interval: 1.58-1.95). The intraclass correlation was 0.984 and an η2 = 0.992 (F = 124.35, P < 0.0001).

Conclusions: Rigorous rater training together with real-time monitoring of site raters by RPMs can achieve a high degree of IRR on the MADRS.

Trial registration: ClinicalTrials.gov NCT03395392.

PubMed Disclaimer

References

    1. Muller MJ, Szegedi A. Effects of interrater reliability of psychopathologic assessment on power and sample size calculations in clinical trials. J Clin Psychopharmacol . 2002;22:318–325.
    1. Berendsen S, Verdegaal LMA, van Tricht MJ, et al. An old but still burning problem: inter-rater reliability in clinical trials with antidepressant medication. J Affect Disord . 2020;276:748–751.
    1. Kobak KA, Brown B, Sharp I, et al. Sources of unreliability in depression ratings. J Clin Psychopharmacol . 2009;29:82–85.
    1. Kobak KA, Kane JM, Thase ME, et al. Why do clinical trials fail? The problem of measurement error in clinical trials: time to test new paradigms? J Clin Psychopharmacol . 2007;27:1–5.
    1. Fassbender M. CRO industry still plagued by turnover: report. Available at: https://www.outsourcing-pharma.com/Article/2019/01/03/CRO-industry-still... .

Substances

Associated data

LinkOut - more resources