Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy

Kathleen D Askland¹, Sarah Garnaat¹, Nicholas J Sibrava², Christina L Boisseau¹, David Strong³, Maria Mancebo¹, Benjamin Greenberg¹, Steve Rasmussen¹, Jane Eisen¹

Affiliations

¹ Department of Psychiatry and Human Behavior, Butler Hospital/Warren Alpert School of Medicine, Brown University, Providence, RI, USA.
² Department of Psychology, Baruch College - The City University of New York, New York, USA.
³ Department of Family and Preventive Medicine, University of California, San Diego, CA, USA.

PMID: 25994109
PMCID: PMC5466447
DOI: 10.1002/mpr.1463

Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy

Kathleen D Askland et al. Int J Methods Psychiatr Res. 2015 Jun.

. 2015 Jun;24(2):156-69.

doi: 10.1002/mpr.1463. Epub 2015 May 21.

Authors

Kathleen D Askland¹, Sarah Garnaat¹, Nicholas J Sibrava², Christina L Boisseau¹, David Strong³, Maria Mancebo¹, Benjamin Greenberg¹, Steve Rasmussen¹, Jane Eisen¹

Affiliations

¹ Department of Psychiatry and Human Behavior, Butler Hospital/Warren Alpert School of Medicine, Brown University, Providence, RI, USA.
² Department of Psychology, Baruch College - The City University of New York, New York, USA.
³ Department of Family and Preventive Medicine, University of California, San Diego, CA, USA.

PMID: 25994109
PMCID: PMC5466447
DOI: 10.1002/mpr.1463

Abstract

The study objective was to apply machine learning methodologies to identify predictors of remission in a longitudinal sample of 296 adults with a primary diagnosis of obsessive compulsive disorder (OCD). Random Forests is an ensemble machine learning algorithm that has been successfully applied to large-scale data analysis across vast biomedical disciplines, though rarely in psychiatric research or for application to longitudinal data. When provided with 795 raw and composite scores primarily from baseline measures, Random Forest regression prediction explained 50.8% (5000-run average, 95% bootstrap confidence interval [CI]: 50.3-51.3%) of the variance in proportion of time spent remitted. Machine performance improved when only the most predictive 24 items were used in a reduced analysis. Consistently high-ranked predictors of longitudinal remission included Yale-Brown Obsessive Compulsive Scale (Y-BOCS) items, NEO items and subscale scores, Y-BOCS symptom checklist cleaning/washing compulsion score, and several self-report items from social adjustment scales. Random Forest classification was able to distinguish participants according to binary remission outcomes with an error rate of 24.6% (95% bootstrap CI: 22.9-26.2%). Our results suggest that clinically-useful prediction of remission may not require an extensive battery of measures. Rather, a small set of assessment items may efficiently distinguish high- and lower-risk patients and inform clinical decision-making.

Keywords: obsessive compulsive disorder; risk factors; statistics.

PubMed Disclaimer

Figures

**Figure 1**
Multidimensional scaling (MDS) plot: predicting *Percent Time Remitted*. Full (p = 795) Feature Set (points colored by binary outcome, *Ever Remit*). Sample MDS plot derived from a single random forest (RF) run under full feature analysis predicting the continuous outcome, *Percent Time Remitted*. For visualization purposes, the points (each of which corresponds to a single subject) are colored according to the binary outcome, *Ever Remit*.

**Figure 2**
Multidimensional scaling (MDS) plot: predicting *Percent Time Remitted*. Points colored by Neuroticism Subscale Score (NEO) and Degree of Interference due to Compulsions (Y‐BOCS). Sample MDS plot derived from a single random forest (RF) run under full feature analysis predicting the continuous outcome, *Percent Time Remitted*. This plot contains the identical points as in Figure 1. However, in this plot, the points are colored according to the subject's scores on two high‐ranked predictor items: a binary partition of the neuroticism subscale score (“lower neuroticism” corresponds to a neuroticism subscale score ≤ 50; “higher neuroticism” indicates > 50); a binary partition of the Y‐BOCS item #7, Interference due to compulsive behaviors (“Mild interference” corresponds to score ≤ 1, “Mod‐Severe interference” corresponds to a score > 1).

**Figure 3**
“Representative Tree”: predicting *Percent Time Remitted* using 24 best predictors. This representative tree models the continuous outcome, *Percent Time Remitted*, and the 24 high‐priority features and was extracted from a single random forest (RF) run (ntree = 5000) using the R “reprtree” (Dasgupta, 2014) package. This package implements the concept of representative trees from ensembles of tree‐based machines on the basis of several tree distance metrics (Banerjee *et al*., 2012). Each node contains the variable selected for splitting at that node and the value on which it was split represented by a mathematical condition. The cases split to the left daughter node are those for which the condition was met; those in the right node are those for which the condition was not met. The numeric values displayed at each terminal node are the mean values of the outcome variable for the subjects residing in that terminal node.

See this image and copyright information in PMC

References

1. Arnold S.E., Xie S.X., Leung Y.Y., Wang L.S., Kling M.A., Han X., Kim E.J., Wolk D.A., Bennett D.A., Chen‐Plotkin A., Grossman M., Hu W., Lee V.M., Mackin R.S., Trojanowski J.Q., Wilson R.S., Shaw L.M. (2012) Plasma biomarkers of depressive symptoms in older adults. Translational Psychiatry, 2(1), e65 DOI: 10.1038/tp.2011.63 - DOI - PMC - PubMed
1. Banerjee M., Ding Y., Noone A.M. (2012) Identifying representative trees from ensembles. Statistics in Medicine, 31(15), 1601–1616. DOI: 10.1002/sim.4492 - DOI - PubMed
1. Biau G. (2012) Analysis of a Random Forests model. Journal of Machine Learning Research, 13(1), 1063–1095.
1. Biau G., Devroye L., Lugosi G. (2008) Consistency of Random Forests and other averaging classifiers. Journal of Machine Learning Research, 9, 2015–2033.
1. Biener L., Abrams D.B. (1991) The contemplation ladder: Validation of a measure of readiness to consider smoking cessation. Health Psychology, 10(5), 360–365. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

MH085810-05/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy

Affiliations

Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical