Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012:1149-1157.
doi: 10.1145/2339530.2339710.

Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Affiliations

Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data

Lei Yuan et al. KDD. 2012.

Abstract

Incomplete data present serious problems when integrating largescale brain imaging data sets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. We address this problem by proposing two novel learning methods where all the samples (with at least one available data source) can be used. In the first method, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. Our second method learns a base classifier for each data source independently, based on which we represent each source using a single column of prediction scores; we then estimate the missing prediction scores, which, combined with the existing prediction scores, are used to build a multi-source fusion model. To illustrate the proposed approaches, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172 AD, 397 MCI, 211 Normal), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithms. Comprehensive experiments show that our proposed methods yield stable and promising results.

Keywords: Algorithms; Multi-source feature learning; incomplete data; multi-task learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of integrating multiple heterogeneous data sources for disease status prediction tasks. More details on the different data sources and prediction tasks used in this study may be found in Section 3.
Figure 2
Figure 2
Illustration of the “block-wise” pattern of missing data for the ADNI data set. In this figure, we show AD and normal control subjects only. For simplicity, we focus on those subjects with complete MRI measures.
Figure 3
Figure 3
Illustration of the proposed multi-task feature learning framework for incomplete multi-source data fusion. In the proposed framework, we first partition the samples into multiple blocks (four blocks in this case), one for each combination of data sources available: (1) PET, MRI; (2) PET, MRI, CSF; (3) MRI, CSF; (4) MRI. We then build four models, one for each block of data, resulting in four prediction tasks. We use a joint feature learning framework that learns all models simultaneously. Specifically, all models involving a specific source are constrained to select a common set of features for that particular source.
Figure 4
Figure 4
Illustration of the proposed model score completion scheme. We first train a base model on each individual data source using the available samples, and the base model is applied to produce prediction scores for this data source; thus each data source is represented by a single column of (incomplete) scores. A missing value estimation method is applied to obtain a complete set of model scores, which are treated as newly derived features to train our final classifier.
Figure 5
Figure 5
Illustration of the results obtained using different λ ratio values in our proposed iMSF method. We vary the λ ratio values from 0.001 to 0.4 (x-axis) and report the accuracy obtained (y-axis) in the left figure. In the right figure, we report the proportion of selected features (Sparsity, x-axis) when we increase λ ratio values from 0.001 to 0.4 (x-axis).

Similar articles

Cited by

References

    1. Ando R, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research. 2005;6:1817–1853.
    1. Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning. 2008;73(3):243–272.
    1. Braskie M, Klunder A, Hayashi K, Protas H, Kepe V, Miller K, Huang S, Barrio J, Ercoli L, Siddarth P, et al. Plaque and tangle imaging and cognition in normal aging and Alzheimer’s disease. Neurobiology of Aging. 2010;31(10):1669–1678. - PMC - PubMed
    1. Cai J, Candès E, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization. 2010;20(4):1956–1982.
    1. Candes E, Tao T. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on. 2010;56(5):2053–2080.

LinkOut - more resources