Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Lei Yuan¹, Yalin Wang, Paul M Thompson, Vaibhav A Narayan, Jieping Ye

Affiliations

PMID: 24014189
PMCID: PMC3763848
DOI: 10.1145/2339530.2339710

Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Lei Yuan et al. KDD. 2012.

. 2012:1149-1157.

doi: 10.1145/2339530.2339710.

Authors

Lei Yuan¹, Yalin Wang, Paul M Thompson, Vaibhav A Narayan, Jieping Ye

Affiliation

¹ Center for Evolutionary Medicine and Informatics, The Biodesign Institute, ASU, Tempe, AZ ; Department of Computer Science and Engineering, ASU, Tempe, AZ.

PMID: 24014189
PMCID: PMC3763848
DOI: 10.1145/2339530.2339710

Abstract

Incomplete data present serious problems when integrating largescale brain imaging data sets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. We address this problem by proposing two novel learning methods where all the samples (with at least one available data source) can be used. In the first method, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. Our second method learns a base classifier for each data source independently, based on which we represent each source using a single column of prediction scores; we then estimate the missing prediction scores, which, combined with the existing prediction scores, are used to build a multi-source fusion model. To illustrate the proposed approaches, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172 AD, 397 MCI, 211 Normal), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithms. Comprehensive experiments show that our proposed methods yield stable and promising results.

Keywords: Algorithms; Multi-source feature learning; incomplete data; multi-task learning.

PubMed Disclaimer

Figures

**Figure 1**
Illustration of integrating multiple heterogeneous data sources for disease status prediction tasks. More details on the different data sources and prediction tasks used in this study may be found in Section 3.

**Figure 2**
Illustration of the “block-wise” pattern of missing data for the ADNI data set. In this figure, we show AD and normal control subjects only. For simplicity, we focus on those subjects with complete MRI measures.

**Figure 3**
Illustration of the proposed multi-task feature learning framework for incomplete multi-source data fusion. In the proposed framework, we first partition the samples into multiple blocks (four blocks in this case), one for each combination of data sources available: (1) PET, MRI; (2) PET, MRI, CSF; (3) MRI, CSF; (4) MRI. We then build four models, one for each block of data, resulting in four prediction tasks. We use a joint feature learning framework that learns all models simultaneously. Specifically, all models involving a specific source are constrained to select a common set of features for that particular source.

**Figure 4**
Illustration of the proposed model score completion scheme. We first train a base model on each individual data source using the available samples, and the base model is applied to produce prediction scores for this data source; thus each data source is represented by a single column of (incomplete) scores. A missing value estimation method is applied to obtain a complete set of model scores, which are treated as newly derived features to train our final classifier.

**Figure 5**
Illustration of the results obtained using different λ ratio values in our proposed iMSF method. We vary the λ ratio values from 0.001 to 0.4 (x-axis) and report the accuracy obtained (y-axis) in the left figure. In the right figure, we report the proportion of selected features (Sparsity, x-axis) when we increase λ ratio values from 0.001 to 0.4 (x-axis).

See this image and copyright information in PMC

Cited by

Identification of Alzheimer's disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine.
Kim J, Lee B. Kim J, et al. Hum Brain Mapp. 2018 Sep;39(9):3728-3741. doi: 10.1002/hbm.24207. Epub 2018 May 7. Hum Brain Mapp. 2018. PMID: 29736986 Free PMC article.
Disentangled-Multimodal Adversarial Autoencoder: Application to Infant Age Prediction With Incomplete Multimodal Neuroimages.
Hu D, Zhang H, Wu Z, Wang F, Wang L, Smith JK, Lin W, Li G, Shen D. Hu D, et al. IEEE Trans Med Imaging. 2020 Dec;39(12):4137-4149. doi: 10.1109/TMI.2020.3013825. Epub 2020 Nov 30. IEEE Trans Med Imaging. 2020. PMID: 32746154 Free PMC article.
Group Guided Fused Laplacian Sparse Group Lasso for Modeling Alzheimer's Disease Progression.
Liu X, Wang J, Ren F, Kong J. Liu X, et al. Comput Math Methods Med. 2020 Feb 20;2020:4036560. doi: 10.1155/2020/4036560. eCollection 2020. Comput Math Methods Med. 2020. PMID: 32104201 Free PMC article.

References

1. Ando R, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research. 2005;6:1817–1853.
1. Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning. 2008;73(3):243–272.
1. Braskie M, Klunder A, Hayashi K, Protas H, Kepe V, Miller K, Huang S, Barrio J, Ercoli L, Siddarth P, et al. Plaque and tangle imaging and cognition in normal aging and Alzheimer’s disease. Neurobiology of Aging. 2010;31(10):1669–1678. - PMC - PubMed
1. Cai J, Candès E, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization. 2010;20(4):1956–1982.
1. Candes E, Tao T. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on. 2010;56(5):2053–2080.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Affiliation

Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources