Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 28:10:315.
doi: 10.1186/1471-2105-10-315.

Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

Affiliations

Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

Sandra Waaijenborg et al. BMC Bioinformatics. .

Abstract

Background: We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes.

Results: We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes.

Conclusion: We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Detection of the discarded pathway variable versus its multiple correlation with the remaining variables in the pathway. Indication of the relation between the likelihood of identifying correctly a discarded pathway variable and the size of the multiple correlation of the discarded pathway variable with the remaining pathway variables.
Figure 2
Figure 2
Glioma pathway (from KEGG, November 2008). Graphical description of the Glioma pathway as given by the Kyoto Encyclopedia of Genes and Genomes.
Figure 3
Figure 3
Cross-validation criterion. Difference between the canonical correlation of the training and validation set as a function of the ridge and lasso penalties.
Figure 4
Figure 4
First nine canonical correlations. Canonical correlations in the training data set, in the validation data set and after permutations.

References

    1. Francke C, Siezen RJ, Teusink B. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol. 2005;13:550–558. doi: 10.1016/j.tim.2005.09.001. - DOI - PubMed
    1. Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Research. 2008;36:D480–D484. doi: 10.1093/nar/gkm882. - DOI - PMC - PubMed
    1. Hertz-Fowler C, Peacock CS, Wood V, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Research. 2004;32:D339–D343. doi: 10.1093/nar/gkh007. - DOI - PMC - PubMed
    1. Wikipedia http://en.wikipedia.org/wiki/Metabolic_network_modelling
    1. Doherty P, Kertes S, Magnusson M, Szalas A. Towards a logical analysis of biochemical pathways. Lecture notes in computer science. 2004;3229:667–679.

MeSH terms

LinkOut - more resources