Machine learning, statistical learning and the future of biological research in psychiatry

R Iniesta¹, D Stahl², P McGuffin¹

Affiliations

¹ Social, Genetic and Developmental Psychiatry Centre,Institute of Psychiatry, Psychology and Neuroscience, King's College London,UK.
² Department of Biostatistics,Institute of Psychiatry, Psychology and Neuroscience, King's College London,UK.

PMID: 27406289
PMCID: PMC4988262
DOI: 10.1017/S0033291716001367

Review

Machine learning, statistical learning and the future of biological research in psychiatry

R Iniesta et al. Psychol Med. 2016 Sep.

. 2016 Sep;46(12):2455-65.

doi: 10.1017/S0033291716001367. Epub 2016 Jul 13.

Authors

R Iniesta¹, D Stahl², P McGuffin¹

Affiliations

¹ Social, Genetic and Developmental Psychiatry Centre,Institute of Psychiatry, Psychology and Neuroscience, King's College London,UK.
² Department of Biostatistics,Institute of Psychiatry, Psychology and Neuroscience, King's College London,UK.

PMID: 27406289
PMCID: PMC4988262
DOI: 10.1017/S0033291716001367

Abstract

Psychiatric research has entered the age of 'Big Data'. Datasets now routinely involve thousands of heterogeneous variables, including clinical, neuroimaging, genomic, proteomic, transcriptomic and other 'omic' measures. The analysis of these datasets is challenging, especially when the number of measurements exceeds the number of individuals, and may be further complicated by missing data for some subjects and variables that are highly correlated. Statistical learning-based models are a natural extension of classical statistical approaches but provide more effective methods to analyse very large datasets. In addition, the predictive capability of such models promises to be useful in developing decision support systems. That is, methods that can be introduced to clinical settings and guide, for example, diagnosis classification or personalized treatment. In this review, we aim to outline the potential benefits of statistical learning methods in clinical research. We first introduce the concept of Big Data in different environments. We then describe how modern statistical learning models can be used in practice on Big Datasets to extract relevant information. Finally, we discuss the strengths of using statistical learning in psychiatric studies, from both research and practical clinical points of view.

Keywords: Machine learning; outcome prediction; personalized medicine; predictive modelling; statistical learning.

PubMed Disclaimer

Figures

**Fig. 1.**
Main steps of the learning process.

**Fig. 2.**
(a) Data simulated from a follow-up study of major depression patients. Age of depression onset (years) and the MADRS score at baseline ranging from 0 to 60 (0–6, normal; 7–19, mild depression; 20–34, moderate depression; >34, severe depression) are the predictor variables. The outcome is remission status at the end of the follow-up (YES or NO). (b) The Naive Bayes classifier is often represented as this type of graph. The direction of the arrows states that each class causes certain features, with a certain probability. (c) A hyper plane (a line, in dimension 2) is built at a maximal distance to every dashed line (called margin). A new case (point) will be classified as remission or non-remission depending on his relative position to the line (aka decision boundary). (d) A simple decision tree suggesting that patients with age of onset lower than 29 are more likely to reach a remission. (e) Each node represents an artificial neuron and each arrow a connection from the output of one neuron to the input of another.

**Fig. 3.**
Example of a 5-fold cross-validation. Data are randomly split in 5-fold of equal size. At every step, one fold is selected as test dataset and the remaining four are used as training data. This procedure is repeated five times, selecting in every step a different fold as test data.

See this image and copyright information in PMC

References

1. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, Pushkarev D, Neff NF, Hudgins L, Gong L, Hodges LM, Berlin DS, Thorn CF, Sangkuhl K, Hebert JM, Woon M, Sagreiya H, Whaley R, Knowles JW, Chou MF, Thakuria JV, Rosenbaum AM, Zaranek AW, Church GM, Greely HT, Quake SR, Altman RB (2010). Clinical assessment incorporating a personal genome. Lancet 375, 1525–1560. - PMC - PubMed
1. Ashworth A, Lord CJ, Reis-Filho JS (2011). Genetic interactions in cancer progression and treatment. Cell 145, 30–38. - PubMed
1. Barr A, Feigenbaum EA, Cohen PR (1981). The Handbook of Artificial Intelligence. William Kaufmann: Stanford.
1. Batista G, Monard MC (2002). A study of K-nearest neighbour as an imputation method. Hybrid Intelligent Systems 87, 251–260.
1. Bishop CM (2006). Pattern Recognition and Machine Learning. Springer: New York.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning, statistical learning and the future of biological research in psychiatry

Affiliations

Machine learning, statistical learning and the future of biological research in psychiatry

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources