Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Sep;1(5):433-447.
doi: 10.1016/j.bpsc.2016.04.002.

Beyond Lumping and Splitting: A Review of Computational Approaches for Stratifying Psychiatric Disorders

Affiliations
Review

Beyond Lumping and Splitting: A Review of Computational Approaches for Stratifying Psychiatric Disorders

Andre F Marquand et al. Biol Psychiatry Cogn Neurosci Neuroimaging. 2016 Sep.

Abstract

Heterogeneity is a key feature of all psychiatric disorders that manifests on many levels, including symptoms, disease course, and biological underpinnings. These form a substantial barrier to understanding disease mechanisms and developing effective, personalized treatments. In response, many studies have aimed to stratify psychiatric disorders, aiming to find more consistent subgroups on the basis of many types of data. Such approaches have received renewed interest after recent research initiatives, such as the National Institute of Mental Health Research Domain Criteria and the European Roadmap for Mental Health Research, both of which emphasize finding stratifications that are based on biological systems and that cut across current classifications. We first introduce the basic concepts for stratifying psychiatric disorders and then provide a methodologically oriented and critical review of the existing literature. This shows that the predominant clustering approach that aims to subdivide clinical populations into more coherent subgroups has made a useful contribution but is heavily dependent on the type of data used; it has produced many different ways to subgroup the disorders we review, but for most disorders it has not converged on a consistent set of subgroups. We highlight problems with current approaches that are not widely recognized and discuss the importance of validation to ensure that the derived subgroups index clinically relevant variation. Finally, we review emerging techniques-such as those that estimate normative models for mappings between biology and behavior-that provide new ways to parse the heterogeneity underlying psychiatric disorders and evaluate all methods to meeting the objectives of such as the National Institute of Mental Health Research Domain Criteria and Roadmap for Mental Health Research.

Keywords: European Roadmap for Mental Health Research; Heterogeneity; Latent cluster analysis; Psychiatry; RDoC; ROAMER; Research Domain Criteria; Subgroup.

PubMed Disclaimer

Figures

Figure 1.
Figure 1
Schematic examples of alternative approaches to clustering and finite mixture models based on supervised learning. (A) This example shows the benefit of correcting mislabeled training samples. A supervised classifier trained to separate experimental classes (black and red points) may be forced to use a complex nonlinear decision boundary (blue line) to separate classes if data points are mislabeled (circled). (B) A simpler decision boundary results if the incorrect labels are corrected, for example using a wrapper method (74). (C) In a semisupervised learning context (75), only some data points have labels (black and red points). These can correspond to samples for which a certain diagnosis can be obtained. All other data points are unlabeled, but can still contribute to defining the decision boundary. Hybrid methods (76, 77, 78) combine supervised classification with unsupervised clustering and use multiple linear decision boundaries to separate the healthy class (blue points) from putative disease subgroups (colored points). See text for further details.
Figure 2.
Figure 2
Schematic examples of alternative approaches to clustering and finite mixture models based on unsupervised learning. (A) Manifold learning techniques aim to find some low-dimensional manifold (right panels) that represent the data more efficiently than the original high-dimensional data (depicted by the cube on the right). Basic dimensionality reduction techniques, such as principal components analysis (PCA), find a single subspace for the data based on maximizing variance. This may not efficiently show structure in high-dimensional data. In contrast, approaches that preserve local distances, such as t-stochastic neighbor (t-SNE) embedding (80), may highlight intrinsic structure more effectively. (B) Novelty detection algorithms, such as the one-class support vector machine (83), aim to find a decision boundary that encloses a set of healthy subjects (blue points), allowing disease profiles to be detected as outliers (red points). Note that this approach does not provide an estimate of the probability density at each point.
Figure 3.
Figure 3
(A) Normative modeling approaches (22, 85, 86) aim to link a set of clinically relevant predictor variables with a set of quantitative biological response variables while quantifying the variation across this mapping. This is achieved by estimating a nonlinear regression model that provides probabilistic measures of predictive confidence (blue contour lines). These could be certainty estimates derived from a probabilistic model (22) or classical confidence intervals (86) and can be interpreted as centiles of variation within the cohort (blue numerals, right). Predictions for new data points (red) can then be derived that provide measures of predictive confidence to quantify the fit of the new data point to the normative model. [Adapted with permission from (22).] (B) By performing this mapping across different domains of functioning (e.g., different cognitive or clinical domains), many types of abnormal patterns can be detected, including classical disease clusters and also disease continua that describe pathology in terms of a gradual progression rather than in terms of sharply defined clusters (see Supplementary Methods for further details).

Similar articles

Cited by

References

    1. Kapur S., Phillips A.G., Insel T.R. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry. 2012;17:1174–1179. - PubMed
    1. McKusick V.A. On lumpers and splitters or nosology of genetic disease. Perspect Biol Med. 1969;12:298–312. - PubMed
    1. Kraepelin E. 8th ed. Krieger Publishing; Huntington, NY: 1909. Psychiatrie; p. 1971.
    1. Bleuler E. Springer-Verlag; Berlin: 1920. Lehrbuch der Psychiatrie.
    1. American Psychiatric Association . 5th ed. American Psychiatric Association; Washington, DC: 2013. Diagnostic and Statistical Manual of Mental Disorders.