Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 4;7(11):e2429630.
doi: 10.1001/jamanetworkopen.2024.29630.

Data-Driven Cutoff Selection for the Patient Health Questionnaire-9 Depression Screening Tool

Brooke Levis  1   2 Parash Mani Bhandari  1 Dipika Neupane  1 Suiqiong Fan  1 Ying Sun  1 Chen He  1 Yin Wu  1 Ankur Krishnan  1 Zelalem Negeri  3 Mahrukh Imran  1 Danielle B Rice  4 Kira E Riehm  1 Marleine Azar  1 Alexander W Levis  1 Jill Boruff  5 Pim Cuijpers  6 Simon Gilbody  7 John P A Ioannidis  8   9   10   11 Lorie A Kloda  12 Scott B Patten  13 Roy C Ziegelstein  14 Daphna Harel  15 Yemisi Takwoingi  16 Sarah Markham  17 Sultan H Alamri  18 Dagmar Amtmann  19 Bruce Arroll  20 Liat Ayalon  21 Hamid R Baradaran  22 Anna Beraldi  23 Charles N Bernstein  24 Arvin Bhana  25 Charles H Bombardier  19 Ryna Imma Buji  26 Peter Butterworth  27 Gregory Carter  28 Marcos H Chagas  29 Juliana C N Chan  30 Lai Fong Chan  31 Dixon Chibanda  32 Kerrie Clover  33 Aaron Conway  34 Yeates Conwell  35 Federico M Daray  36 Janneke M de Man-van Ginkel  37 Jesse R Fann  38 Felix H Fischer  39 Sally Field  40 Jane R W Fisher  41 Daniel S S Fung  42 Bizu Gelaye  43 Leila Gholizadeh  44 Felicity Goodyear-Smith  20 Eric P Green  45 Catherine G Greeno  46 Brian J Hall  47 Liisa Hantsoo  48 Martin Härter  49 Leanne Hides  50 Stevan E Hobfoll  51 Simone Honikman  40 Thomas Hyphantis  52 Masatoshi Inagaki  53 Maria Iglesias-Gonzalez  54 Hong Jin Jeon  55 Nathalie Jetté  56 Mohammad E Khamseh  22 Kim M Kiely  57 Brandon A Kohrt  58 Yunxin Kwan  59 Maria Asunción Lara  60 Holly F Levin-Aspenson  61 Shen-Ing Liu  62 Manote Lotrakul  63 Sonia R Loureiro  29 Bernd Löwe  64 Nagendra P Luitel  65 Crick Lund  66 Ruth Ann Marrie  67 Laura Marsh  68 Brian P Marx  69 Anthony McGuire  70 Sherina Mohd Sidik  71 Tiago N Munhoz  72 Kumiko Muramatsu  73 Juliet E M Nakku  74 Laura Navarrete  75 Flávia L Osório  29 Brian W Pence  76 Philippe Persoons  77 Inge Petersen  25 Angelo Picardi  78 Stephanie L Pugh  79 Terence J Quinn  80 Elmars Rancans  80   81 Sujit D Rathod  82 Katrin Reuter  83 Alasdair G Rooney  84 Iná S Santos  72 Miranda T Schram  85 Juwita Shaaban  86 Eileen H Shinn  87 Abbey Sidebottom  88 Adam Simning  35 Lena Spangenberg  89 Lesley Stafford  90 Sharon C Sung  62 Keiko Suzuki  91 Pei Lin Lynnette Tan  59 Martin Taylor-Rowan  92 Thach D Tran  41 Alyna Turner  93 Christina M van der Feltz-Cornelis  94 Thandi van Heyningen  95 Paul A Vöhringer  96 Lynne I Wagner  97 Jian Li Wang  98 David Watson  99 Jennifer White  100 Mary A Whooley  101   102 Kirsty Winkley  103 Karen Wynter  104 Mitsuhiko Yamada  105 Qing Zhi Zeng  106 Yuying Zhang  107 Brett D Thombs  1   2   108   109   110   111 Andrea Benedetti  2   109   112   113 Depression Screening Data (DEPRESSD) PHQ Group
Affiliations

Data-Driven Cutoff Selection for the Patient Health Questionnaire-9 Depression Screening Tool

Brooke Levis et al. JAMA Netw Open. .

Abstract

Importance: Test accuracy studies often use small datasets to simultaneously select an optimal cutoff score that maximizes test accuracy and generate accuracy estimates.

Objective: To evaluate the degree to which using data-driven methods to simultaneously select an optimal Patient Health Questionnaire-9 (PHQ-9) cutoff score and estimate accuracy yields (1) optimal cutoff scores that differ from the population-level optimal cutoff score and (2) biased accuracy estimates.

Design, setting, and participants: This study used cross-sectional data from an existing individual participant data meta-analysis (IPDMA) database on PHQ-9 screening accuracy to represent a hypothetical population. Studies in the IPDMA database compared participant PHQ-9 scores with a major depression classification. From the IPDMA population, 1000 studies of 100, 200, 500, and 1000 participants each were resampled.

Main outcomes and measures: For the full IPDMA population and each simulated study, an optimal cutoff score was selected by maximizing the Youden index. Accuracy estimates for optimal cutoff scores in simulated studies were compared with accuracy in the full population.

Results: The IPDMA database included 100 primary studies with 44 503 participants (4541 [10%] cases of major depression). The population-level optimal cutoff score was 8 or higher. Optimal cutoff scores in simulated studies ranged from 2 or higher to 21 or higher in samples of 100 participants and 5 or higher to 11 or higher in samples of 1000 participants. The percentage of simulated studies that identified the true optimal cutoff score of 8 or higher was 17% for samples of 100 participants and 33% for samples of 1000 participants. Compared with estimates for a cutoff score of 8 or higher in the population, sensitivity was overestimated by 6.4 (95% CI, 5.7-7.1) percentage points in samples of 100 participants, 4.9 (95% CI, 4.3-5.5) percentage points in samples of 200 participants, 2.2 (95% CI, 1.8-2.6) percentage points in samples of 500 participants, and 1.8 (95% CI, 1.5-2.1) percentage points in samples of 1000 participants. Specificity was within 1 percentage point across sample sizes.

Conclusions and relevance: This study of cross-sectional data found that optimal cutoff scores and accuracy estimates differed substantially from population values when data-driven methods were used to simultaneously identify an optimal cutoff score and estimate accuracy. Users of diagnostic accuracy evidence should evaluate studies of accuracy with caution and ensure that cutoff score recommendations are based on adequately powered research or well-conducted meta-analyses.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Ayalon reported receiving grants from Lundbeck during the conduct of the study. Dr Bernstein reported receiving grants from AbbVie, Eli Lilly, Fresenius Kabi, Janssen, Pfizer, Takeda, Boston Scientific, JAMP Pharma, Organon, and Sandoz and personal fees from AbbVie, Amgen, Bristol Myers Squibb, Eli Lilly, Fresenius Kabi, Janssen, Pfizer, and Takeda outside the submitted work. Dr Butterworth reported receiving grants from Safe Work Australia and Australian Research Council during the conduct of the study. Dr J.C.N. Chan reported receiving grants from the European Foundation for Study of Diabetes during the conduct of the study. Dr L.F. Chan reported receiving nonfinancial support from Otsuka and Lundbeck and personal fees from Johnson & Johnson during the conduct of the study and nonfinancial support from Ortho-McNeil-Janssen and Menarini outside the submitted work. Dr Inagaki reported receiving personal fees from Meiji, Mochida, Takeda, Novartis, Yoshitomi, Pfizer, Eisai, Otsuka, MSD, Sumitomo Dainippon, Janssen, and Eli Lilly outside the submitted work. Dr Rancans reported receiving grants from Gedeon Richter; personal fees and nonfinancial support from Gedeon Richter, Lundbeck, Servier, and Janssen Cilag; and personal fees from Zentiva and AbbVie outside the submitted work. Dr Shinn reported receiving grants from the National Cancer Institute (NCI) during the conduct of the study. Dr Simning reported receiving grants from the Agency for Healthcare Research and Quality, National Center for Research Resources, and National Institute of General Medical Sciences during the conduct of the study. Dr Stafford reported receiving a PhD scholarship from The University of Melbourne during the conduct of the study. Dr Wagner reported receiving grants from the State of Pennsylvania tobacco settlement fund and NCI during the conduct of the study and personal fees from Celgene/Bristol Myers Squibb outside the submitted work. Dr Benedetti reported receiving grants from the Canadian Institutes of Health Research (CIHR) during the conduct of the study. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Variability of Data-Driven Optimal Cutoff Scores in 1000 Resampled Studies of 100, 200, 500, and 1000 Participants
Figure 2.
Figure 2.. Variability in Accuracy Estimates of the Optimal Cutoff Scores in 1000 Resampled Studies of 100, 200, 500, and 1000 Participants vs Accuracy Values for a Cutoff of 8 or Higher in the Population
Edges of boxes represent the 25th and 75th percentiles; horizontal line inside boxes represents the median; dashed horizontal line represents the accuracy of the true population-level optimal cutoff score in the full Patient Health Questionnaire-9 individual participant data meta-analysis dataset (cutoff score ≥8; sensitivity = 80.4%, specificity = 82.0%); and dots represent outliers.
Figure 3.
Figure 3.. Variability in Accuracy Estimates of a Cutoff Score of 8 or Higher in 1000 Resampled Studies of 100, 200, 500, and 1000 Participants vs Accuracy Values for a Cutoff of 8 or Higher in the Population
Edges of boxes represent the 25th and 75th percentiles; horizontal line inside boxes represents the median; dashed horizontal line represents the accuracy of the true population-level optimal cutoff score in the full Patient Health Questionnaire-9 individual participant data meta-analysis dataset (cutoff score ≥8; sensitivity = 80.4%, specificity = 82.0%); and dots represent outliers.

References

    1. Brehaut E, Neupane D, Levis B, et al. . ‘Optimal’ cutoff selection in studies of depression screening tool accuracy using the PHQ-9, EPDS, or HADS-D: a meta-research study. Int J Methods Psychiatr Res. 2023;32(3):e1956. doi:10.1002/mpr.1956 - DOI - PMC - PubMed
    1. Thombs BD, Rice DB. Sample sizes and precision of estimates of sensitivity and specificity from primary studies on the diagnostic accuracy of depression screening tools: a survey of recently published studies. Int J Methods Psychiatr Res. 2016;25(2):145-152. doi:10.1002/mpr.1504 - DOI - PMC - PubMed
    1. Nassar EL, Levis B, Neyer MA, et al. . Sample size and precision of estimates in studies of depression screening tool accuracy: a meta-research review of studies published in 2018-2021. Int J Methods Psychiatr Res. 2022;31(2):e1910. doi:10.1002/mpr.1910 - DOI - PMC - PubMed
    1. Linnet K, Brandt E. Assessing diagnostic tests once an optimal cutoff point has been selected. Clin Chem. 1986;32(7):1341-1346. doi:10.1093/clinchem/32.7.1341 - DOI - PubMed
    1. Ewald B. Post hoc choice of cut points introduced bias to diagnostic research. J Clin Epidemiol. 2006;59(8):798-801. doi:10.1016/j.jclinepi.2005.11.025 - DOI - PubMed

Publication types