. 2020 Aug;4(8):787-800.

doi: 10.1038/s41551-020-0593-y. Epub 2020 Aug 3.

Defining and predicting transdiagnostic categories of neurodegenerative disease

Eli J Cornblath^{1

2}, John L Robinson³, David J Irwin⁴, Edward B Lee⁵, Virginia M-Y Lee³, John Q Trojanowski³, Danielle S Bassett^{6

7

8

9

10

11}

Affiliations

¹ Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
² Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
³ Center for Neurodegenerative Disease Research, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁴ Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁵ Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁶ Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁷ Center for Neurodegenerative Disease Research, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁸ Department of Physics and Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁹ Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
¹⁰ Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
¹¹ Santa Fe Institute, Santa Fe, NM, USA. dsb@seas.upenn.edu.

PMID: 32747831
PMCID: PMC7946378
DOI: 10.1038/s41551-020-0593-y

Defining and predicting transdiagnostic categories of neurodegenerative disease

Eli J Cornblath et al. Nat Biomed Eng. 2020 Aug.

. 2020 Aug;4(8):787-800.

doi: 10.1038/s41551-020-0593-y. Epub 2020 Aug 3.

Authors

Eli J Cornblath^{1

2}, John L Robinson³, David J Irwin⁴, Edward B Lee⁵, Virginia M-Y Lee³, John Q Trojanowski³, Danielle S Bassett^{6

7

8

9

10

11}

Affiliations

¹ Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
² Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
³ Center for Neurodegenerative Disease Research, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁴ Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁵ Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁶ Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁷ Center for Neurodegenerative Disease Research, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁸ Department of Physics and Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
⁹ Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
¹⁰ Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. dsb@seas.upenn.edu.
¹¹ Santa Fe Institute, Santa Fe, NM, USA. dsb@seas.upenn.edu.

PMID: 32747831
PMCID: PMC7946378
DOI: 10.1038/s41551-020-0593-y

Abstract

The prevalence of concomitant proteinopathies and heterogeneous clinical symptoms in neurodegenerative diseases hinders the identification of individuals who might be candidates for a particular intervention. Here, by applying an unsupervised clustering algorithm to post-mortem histopathological data from 895 patients with degeneration in the central nervous system, we show that six non-overlapping disease clusters can simultaneously account for tau neurofibrillary tangles, α-synuclein inclusions, neuritic plaques, inclusions of the transcriptional repressor TDP-43, angiopathy, neuron loss and gliosis. We also show that membership to the six transdiagnostic disease clusters, which explains more variance in cognitive phenotypes than can be explained by individual diagnoses, can be accurately predicted from scores of the Mini-Mental Status Exam, protein levels in cerebrospinal fluid, and genotype at the APOE and MAPT loci, via cross-validated multiple logistic regression. This combination of unsupervised and supervised data-driven tools provides a framework that could be used to identify latent disease subtypes in other areas of medicine.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

**Fig. 1 |. Schematic of data processing.**
a, The burden of amyloid-β plaques, α-synuclein plaques, tau neurofibrillary tangles, TDP-43 inclusions, ubiquitin, neuritic plaques, angiopathy, gliosis and neuron loss was evaluated on a five-tier ordinal scale (0, rare, 1+, 2+ or 3+) via cerebral autopsy in 895 patients through the Integrated Neurodegenerative Disease Database. Evaluation of pathological burden was performed for all proteins for the listed regions, in addition to the substantia nigra and locus coeruleus, which are hidden for ease of visualization. Dentate gyrus and CA1–subiculum were quantified separately but shown together here as the hippocampus for ease of visualization. Abbreviations for brain regions can be found in Supplementary Table 1. b, We computed a 895 × 895 similarity matrix in which element i,j contains a polychoric correlation (r) between pathology score vectors for patient i and patient j. Next, we used k-medoids clustering to assign each patient to a data-driven disease cluster. α-Syn, α-synuclein. c, Using linked data from CSF protein testing and genotyping, we trained statistical models to predict membership to disease clusters.

**Fig. 2 |. Unsupervised clustering of copathology groups disease entities into proteinopathy families.**
a, We computed a matrix of polychoric correlations between vectors of pathology scores for each pair of subjects across all available pathological features to quantify the similarity in pathology scores. b, The same matrix as in panel a, where rows and columns are ordered by primary histopathologic diagnosis. Black lines along the diagonal mark blocks of patients with the same diagnosis. AD, Alzheimer’s disease. PiD, Pick’s disease; oth, other; T–O, tau other. c, The same matrix as in panel a, where rows and columns are ordered by a partition detected through k-medoids clustering. Black lines along the diagonal mark blocks of patients grouped into the same cluster. d, Representative vector of pathology scores for each cluster (cluster centroids) demonstrate distinct profiles of pathology that map to underlying molecular drivers of disease, including tau, amyloid-β, TDP-43 and α-synuclein. Thio, thioflavin-staining neuritic plaques. e, Composition of each cluster in terms of primary histopathologic diagnoses (see Methods section ‘Sample construction‘). Each cluster is comprised of disease entities that are putatively caused by the protein most highly represented in the cluster’s centroid. Counts placed above stacked bars indicate the number of patients in each cluster. f, In a subset of patients, all of whom have a primary diagnosis of Alzheimer’s disease (high or intermediate ADNC), we show the composition of each cluster in terms of secondary histopathologic diagnosis. Counts placed above stacked bars indicate the number of patients with Alzheimer’s disease in each cluster. ADNC is identified through ABC staging. See Methods section ‘Sample construction‘ for definition of ‘tau other’ and ‘other’.

**Fig. 3 |. Comparison of ADNC and Lewy body copathology clusters.**
a, Median pathology scores for patients with intermediate to high ADNC and LBD in cluster 2 (left), cluster 4 (middle), and cluster 5 (right), represented as a region × type matrix of pathological features. b, Matrix of pairwise comparisons of median pathology scores for each pathological feature, where colour axis reflects the indicated difference in pathology. *P_FDR < 0.05. FDR, false discovery rate; P_FDR, the P value after correcting for multiple comparisons by controlling FDR at <0.05; Cing, anterior cingulate cortex; SMT, superior-middle temporal cortex; MF, middle frontal gyrus; Ang, angular gyrus; CS, CA1/subiculum; EC, entorhinal cortex; Amyg, amygdala; TS, thalamus; CP, caudate-putamen; GP, globus pallidus; SN, substantia nigra; Med, medulla; CB, cerebellum; MB, midbrain. Refer to Supplementary Table 1 for a tabulation of abbreviations.

**Fig. 4 |. Comparison of MoCA scores between clusters.**
Pairwise intercluster comparisons of median MoCA subscores using the Wilcoxon rank-sum test, FDR-corrected for multiple comparisons (q < 0.05) over all pairwise tests for six subscores. Plots were constructed using code from R package *ggpubr*. NS, P_FDR > 0.05; *P_FDR < 0.05, **P_FDR < 0.01, ***P_FDR < 0.001 and ****P_FDR < 10⁻⁶. In box plots, box edges represent the 25th and 75th percentiles, the centre line shows the median and whiskers extend from the box edges to the most extreme data point value that is at most 1.5 × interquartile range (IQR). Data beyond the end of the whiskers are plotted individually as dots. Precise P values can be found in Supplementary Table 3.

**Fig. 5 |. Disease clusters capture the relationship between cognitive measures and pathology scores.**
a, Pairwise Spearman correlations between each pathology score and the MoCA visuospatial subscore (left), the MoCA orientation subscore (middle) or total MMSE score (right) thresholded at P_FDR < 0.05, corrected within each subpanel. b, Relative node purity index, a measure of feature importance (Imp.) for random forest models trained to use all pathology scores to predict MoCA visuospatial subscores (left), the MoCA orientation subscores (panel) or total MMSE scores (right). Titles indicate the average model R² in held-out data over 50 repetitions of fivefold cross-validation. c, Distributions of R² values for predicting MoCA visuospatial subscores (left panel), the MoCA orientation subscores (middle) or total MMSE scores (right) in held-out data over 50 repetitions of fivefold cross-validation, using different sets of predictors: all pathology, entire matrix in Supplementary Fig. 9; path type average, average score collapsed over regions for each type of pathological feature, that is, synuclein, TDP-43 and others; regional pathology average, average pathology score for each region collapsed over types of pathology; clusters, binary indicators of cluster membership; ADNC, binary indicators of level of ADNC; LBD, binary indicators of LBD distribution; PCA, first six principal components of polychoric correlations of pathology matrix (Supplementary Fig. 11); EFA, first six exploratory factors from Supplementary Fig. 10. In box plots, box edges represent the 25th and 75th percentiles, the centre line shows the median and whiskers extend from the box edges to the most extreme data point value that is at most 1.5 × IQR. Data beyond the end of the whiskers are plotted individually as dots.

**Fig. 6 |. Comparison of CSF protein levels between disease clusters.**
Pairwise intercluster comparisons of median CSF protein levels for amyloid-β₁₋₄₂, phosphorylated tau and total tau using the Wilcoxon rank-sum test, FDR-corrected for multiple comparisons (q < 0.05) over all pairwise tests for all three proteins. Plots were constructed using code from the R package *ggpubr*. NS, P_FDR > 0.05, *P_FDR < 0.05, **P_FDR < 0.01, ***P_FDR < 0.001 and ****P_FDR < 10⁻⁶. In box plots, box edges represent the 25th and 75th percentiles, the centre line shows the median and whiskers extend from the box edges to the most extreme data point value that is at most 1.5 × IQR. Data beyond the end of the whiskers are plotted individually as dots. Precise P values can be found in Supplementary Table 4.

**Fig. 7 |. Prevalence of Alzheimer’s disease risk alleles differs across disease clusters.**
a, Within each cluster, we calculated the proportion of each genotype for *APOE* (left) and *MAPT* (right). b,d, Matrix of logistic regression β-weights, whose element i,j reflects the increase in log odds ratio for membership to cluster i relative to cluster j given the presence of *MAPT*^H2 (d) or *MAPT*^H1 (b). c, Matrix of logistic regression β-weights, whose element i,j reflects the increase in log odds ratio for membership to cluster i relative to cluster j given the presence of *APOE*^ϵ2 (left), *APOE*^ϵ3 (middle) or *APOE*^ϵ4 (right). NS, P_FDR > 0.05. *P_FDR < 0.05, **P_FDR < 0.01, ***P_FDR < 0.001 and ****P_FDR < 10⁻⁶.

**Fig. 8 |. Identifying disease labels from initial testing of CSF protein.**
a,b, Characteristics of prediction of existing diagnoses (a) or disease clusters (b) in held-out testing data using multiple logistic regression to predict disease labels from CSF protein levels. Sub-panels (i) and (ii) show the test-set sensitivity and specificity, respectively, using a threshold value of 0.5. Subpanel (iii) shows the area under the curve (AUC) on the test set, reflecting performance over a range of threshold values. Bar length represents mean performance, and error bars indicate 95% confidence intervals over 100 repetitions of k-fold cross-validation at k = 5; mean value and 95% confidence interval are shown in each bar. Subpanel (iv) shows representative receiver-operator characteristic curves for test-set predictions of existing diagnoses (a) or disease clusters (b). Subpanel (v) shows mean standardized multiple logistic regression β weights across 100 repetitions of k-fold cross-validation at k = 5 in the prediction task. The β weights can be interpreted as the increase in log odds ratio for a one s.d. increase in the value of the predictor. TPR, true positive rate; FPR, false positive rate; total tau, total CSF tau protein; phosph. tau, total CSF phosphorylated tau; amyloid-β₁₋₄₂, total CSF amyloid-β₁₋₄₂.

See this image and copyright information in PMC

Comment in

Reclassifying neurodegenerative diseases.
Villoslada P, Baeza-Yates R, Masdeu JC. Villoslada P, et al. Nat Biomed Eng. 2020 Aug;4(8):759-760. doi: 10.1038/s41551-020-0600-3. Nat Biomed Eng. 2020. PMID: 32747833 No abstract available.

References

1. Hebert LE, Scherr PA, Bienias JL, Bennett DA & Evans DA Alzheimer disease in the US population. Arch. Neurol 60, 1119 (2003). - PubMed
1. Alzheimer’s Association 2019 Alzheimer’s disease facts and figures. Alzheimers Dement. 15, 321–387 (2019).
1. Dorsey ER et al. Projected number of people with Parkinson disease in the most populous nations, 2005 through 2030. Neurology 68, 384–386 (2007). - PubMed
1. Brookmeyer R, Gray S & Kawas C. Projections of Alzheimer’s disease in the United States and the public health impact of delaying disease onset. Am. J. Public Health 88, 1337–1342 (1998). - PMC - PubMed
1. Hyman BT et al. National Institute on Aging–Alzheimer’s association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheimers Dement. 8, 1–13 (2012). - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Associated data

figshare/10.6084/m9.figshare.12519488.v1

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Defining and predicting transdiagnostic categories of neurodegenerative disease

Affiliations

Defining and predicting transdiagnostic categories of neurodegenerative disease

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous