Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Nov 15:102 Pt 1:220-8.
doi: 10.1016/j.neuroimage.2014.01.021. Epub 2014 Feb 12.

Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs

Affiliations
Review

Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs

Hongbao Cao et al. Neuroimage. .

Abstract

Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis.

Keywords: SNP; Schizophrenia; Sparse representations; Variable selection; fMRI.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The flowchart of our proposed variable selection for integrative analysis of two types of data
Fig. 2
Fig. 2
Variable selection with generalized sparse model using different models, where the number of selected fMRI voxels is in red color and the number of selected SNPs is in blue color. The ‘Weight factor’ in the plots refers to the weight factor α1 (for SNP data set), and the weight factor α2 = 1 − α1 (for fMRI data set).
Fig. 3
Fig. 3
The newly selected variables in each trial with the decrease of the corresponding weight factor. The ‘Weight factor’ in the plots refers to the weight factor α1, and the weight factor α2 = 1 − α1.
Fig. 3
Fig. 3
The newly selected variables in each trial with the decrease of the corresponding weight factor. The ‘Weight factor’ in the plots refers to the weight factor α1, and the weight factor α2 = 1 − α1.
Fig. 4
Fig. 4
Comparison of the selected variables (SNPs/fMRI voxels) using a Venn diagram. A, B and C are the variables selected using SRVS with L1/2, L0 and L1 norm penalties, respectively.
Figure 5
Figure 5
A comparison of the selected fMRI voxels between SRVS (L1/2) and Li et al.’s method (Li et al., 2009). The value of a voxel represents the frequency that it has been selected in the 16 trials
Fig. 6
Fig. 6
A comparison of classification results of using four sparse models. (a) gives the classification ratio of differentiating schizophrenia from healthy controls using four models with different weight factors; (b) is the box plot generated with ANOVA analysis of the classification ratios using four different models.

References

    1. Badner JA, Gershon ES. Meta-analysis of whole-genome linkage scans of bipolar disorder and schizophrenia. Mol Psychiatry. 2002;7(4):405–411. - PubMed
    1. Cai TT, Wang L. Orthogonal Matching Pursuit for Sparse Signal Recovery. IEEE Trans on Inf Theory. 2011;57(7):1–26.
    1. Callicott JH, Straub RE, Pezawas L, Egan MF, Mattay VS, Hariri AR, Verchinski BA, Meyer-Lindenberg A, Balkissoon R, Kolachana B, Goldberg TE, Weinberger DR. Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia. Proc Natl Acad Sci U S A. 2005 Jun;102(24):8627–32. - PMC - PubMed
    1. Candès E, Tao T. Near optimal signal recovery from random projections:Universal encoding strategies? IEEE Trans Inf Theory Dec. 2006;52(12):5406–5425.
    1. Cao H, Deng H, Li M, Wang YP. Classification of Multicolor Fluorescence In-situ Hybridization (M-FISH) Images with Sparse Representation, IEEE Tans. Nanobioscience. 2012a;11(2):111–118. - PMC - PubMed

Publication types