Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 2;61(3):622-32.
doi: 10.1016/j.neuroimage.2012.03.059. Epub 2012 Mar 29.

Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data

Collaborators, Affiliations

Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data

Lei Yuan et al. Neuroimage. .

Abstract

Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the proposed system. We are given a multi-source data set with incomplete sources. Instead of removing valuable subjects if they have missing data, we use structured multi-task learning to enable feature learning from incomplete data. Then, along with other methods to impute missing values, we obtain a series of plausible models to aggregate into an even more robust one.
Figure 2
Figure 2
Here we illustrate the “block-wise” pattern of missing data for the ADNI dataset. In this figure, we show AD and normal control subjects only. For simplicity, we focus on those subjects with complete MRI measures. Note in our entire study, there are still 132 subjects who do not have MRI measures, as the UCSF group only released pre-processed baseline MRI imaging features for 648 subjects.
Figure 3
Figure 3
Illustration of the proposed multi-task feature learning framework for incomplete multi-source data fusion. In the proposed framework, we first partition the samples into multiple blocks (four blocks in this case), one for each combination of data sources available: (1) PET, MRI; (2) PET, MRI, CSF; (3) MRI, CSF; (4) MRI. We then build four models, one for each block of data, resulting in four prediction tasks. We use a joint feature learning framework that learns all models simultaneously. Specifically, all models involving a specific source are constrained to select a common set of features for that particular source. As shown above, all four tasks select a common subset of MRI features (the selected features for all four tasks are highlighted).
Figure 4
Figure 4
Illustration of the learning-based ensemble method. For a given dataset, different base models (with different parameters if necessary) are applied and each of them gives a classification score on each sample (training or testing). Then, these scores are considered as the new dataset on which the final training and testing are performed.
Figure 5
Figure 5
Illustration of the results obtained using different λ in our proposed iMSF method. The AD/NC problem is used, and leave-one-out performance is reported. We vary the λ value from 0.001 to 0.6 (x-axis) and report the accuracy obtained (y-axis) in the top figure. In the bottom figure, we report the proportion of selected features (Sparsity, y-axis) when we increase λ from 0.001 to 0.6 (x-axis).

References

    1. 2011 Alzheimer’s Disease Facts and Figures. http://www.alz.org.
    1. Ando RK, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research. 2005;6:1817–1853.
    1. Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning. 2008;73:243–272.
    1. Ashburner J, Friston K. Multimodal image coregistration and partitioning--a unified framework. Neuroimage. 1997;6:209–217. - PubMed
    1. Braskie MN, Klunder AD, Hayashi KM, Protas H, Kepe V, Miller KJ, Huang SC, Barrio JR, Ercoli LM, Siddarth P, Satyamurthy N, Liu J, Toga AW, Bookheimer SY, Small GW, Thompson PM. Plaque and tangle imaging and cognition in normal aging and Alzheimer’s disease. Neurobiol Aging. 2008;31:1669–1678. - PMC - PubMed

Publication types

MeSH terms

Substances