A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder

J S Yu¹, A Y Xue¹, E E Redei², N Bagheri¹

Affiliations

¹ Chemical and Biological Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
² Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Evanston, IL, USA.

PMID: 27779627
PMCID: PMC5290347
DOI: 10.1038/tp.2016.198

A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder

J S Yu et al. Transl Psychiatry. 2016.

. 2016 Oct 25;6(10):e931.

doi: 10.1038/tp.2016.198.

Authors

J S Yu¹, A Y Xue¹, E E Redei², N Bagheri¹

Affiliations

¹ Chemical and Biological Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
² Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Evanston, IL, USA.

PMID: 27779627
PMCID: PMC5290347
DOI: 10.1038/tp.2016.198

Abstract

Major depressive disorder (MDD) is a critical cause of morbidity and disability with an economic cost of hundreds of billions of dollars each year, necessitating more effective treatment strategies and novel approaches to translational research. A notable barrier in addressing this public health threat involves reliable identification of the disorder, as many affected individuals remain undiagnosed or misdiagnosed. An objective blood-based diagnostic test using transcript levels of a panel of markers would provide an invaluable tool for MDD as the infrastructure-including equipment, trained personnel, billing, and governmental approval-for similar tests is well established in clinics worldwide. Here we present a supervised classification model utilizing support vector machines (SVMs) for the analysis of transcriptomic data readily obtained from a peripheral blood specimen. The model was trained on data from subjects with MDD (n=32) and age- and gender-matched controls (n=32). This SVM model provides a cross-validated sensitivity and specificity of 90.6% for the diagnosis of MDD using a panel of 10 transcripts. We applied a logistic equation on the SVM model and quantified a likelihood of depression score. This score gives the probability of a MDD diagnosis and allows the tuning of specificity and sensitivity for individual patients to bring personalized medicine closer in psychiatry.

PubMed Disclaimer

Figures

**Figure 1**
A linear boundary SVM and logistic regression outperform random forests in identifying subjects with MDD. Three supervised machine-learning methods were applied to discriminate MDD subjects from control subjects: (left) logistic regression, (center) random forests, and (right) support vector machines. To improve model prediction and identify an optimal transcript set, backward selection was performed. Backward selection removes transcripts from the explanatory variables in the classification model individually; for each iteration, we recalculate model accuracy, sensitivity, and specificity. The transcript associated with the lowest accuracy is permanently removed from the set of predictive variables and the process is repeated. Random forests had less accuracy than logistic regression or SVMs, suggesting that nonlinear contributions of the explanatory variables did not provide additional accuracy to the model. Logistic regression and SVMs with a linear boundary both had high accuracy, 92.2% and 90.6%, respectively. MDD, major depressive disorder; SVM, support vector machine.

**Figure 2**
Combinations of RNA measurements have high predictive power, though individual measurements can be non-predictive. The heat maps contain the cross-validated accuracy, specificity, and sensitivity of pairwise combinations of the top five predictive transcripts from backward selection with linear SVMs. Even though *AMFR* had high predictive value for classifying MDD in conjunction with other variables, it had no predictive power on its own, as demonstrated by a specificity of 0%. Combinations of transcripts can inform a useful SVM boundary even if single transcripts have no ability in isolation, suggesting that depression is associated with combinations of genes. Note that backward selection does not comprehensively explore all transcript combinations and the complex relationships among transcripts suggest that a more predictive combination may still remain. MDD, major depressive disorder; SVM, support vector machine.

**Figure 3**
SVMs combined with a logistic equation provides quantitative LiD score corresponding to the probability of a MDD diagnosis. A logistic equation was fit on the boundary inferred by the linear SVM from the 10 most predictive transcripts. For illustrative purposes, we show the same method on the two most predictive pairwise genes, *DGKA* and *CDR2*; the full predictive model uses 10 transcripts and would be impractical to visualize. The thick line corresponds to the logistic regression inflection point and the thin lines correspond to deciles of probability of a MDD diagnosis fitted from logistic regression. The LiD score range is shown for each region. The overlaps in transcript measurements between MDD and ND control subjects highlights the inherent noise in MDD diagnoses as well as biological experiments. However, the probabilistic interpretation from logistic regression offers a diagnostic tool useful for clinicians. LiD, likelihood of depression; MDD, major depressive disorder; ND, no-disorder; SVM, support vector machine.

**Figure 4**
Some genes were highly predictive for diagnosing MDD, suggesting the biological processes underlying the etiology of major depression. Logistic regression and SVMs both identified genes with significant ability to predict MDD. (a) Maximum predictive power was achieved in logistic regression with 14 transcripts and in SVMs with 10 transcripts, with eight in common. These eight genes are hypothesized to have importance in explaining the biological processes underlying MDD. (b) Enrichr was used to find biological processes enriched in the eight common variables between logistic regression and SVMs. Four processes, as defined by the Gene Ontology (GO) database, were found to be significantly enriched (GO:0015929, GO:0004143, GO:0003951, and GO:0070567). The dotted line indicates the significance cutoff (adjusted p-value of 0.05). Together, these results suggest that multiple converging pathways may have independent roles in contributing to the depressive phenotype, and that MDD may have independent causal factors. MDD, major depressive disorder; SVM, support vector machine.

See this image and copyright information in PMC

References

1. Ferrari AJ, Charlson FJ, Norman RE, Patten SB, Freedman G, Murray CJL et al. Burden of depressive disorders by country, sex, age, and year: findings from the Global Burden of Disease Study 2010. PLoS Med 2013; 10: e1001547. - PMC - PubMed
1. Kessler RC, Berglund P, Demler O, Jin R, Merikangas KR, Walters EE. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry 2005; 62: 593–602. - PubMed
1. Carpenter KM, Hasin DS, Allison DB, Faith MS. Relationships between obesity and DSM-IV major depressive disorder, suicide ideation, and suicide attempts: results from a general population study. Am J Public Health 2000; 90: 251–257. - PMC - PubMed
1. Wells KB, Hays RD, Burnam MA, Rogers W, Greenfield S, Ware JE. Detection of depressive disorder for patients receiving prepaid or fee-for-service care. Results from the Medical Outcomes Study. JAMA 1989; 262: 3298–3302. - PubMed
1. Belmaker RH, Agam G. Major depressive disorder. N Engl J Med 2008; 358: 55–68. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder

Affiliations

A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources