Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo
- PMID: 16873502
- DOI: 10.1093/bioinformatics/btl256
Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo
Abstract
Motivation: Gene expression profiling is a powerful approach to identify genes that may be involved in a specific biological process on a global scale. For example, gene expression profiling of mutant animals that lack or contain an excess of certain cell types is a common way to identify genes that are important for the development and maintenance of given cell types. However, it is difficult for traditional computational methods, including unsupervised and supervised learning methods, to detect relevant genes from a large collection of expression profiles with high sensitivity and specificity. Unsupervised methods group similar gene expressions together while ignoring important prior biological knowledge. Supervised methods utilize training data from prior biological knowledge to classify gene expression. However, for many biological problems, little prior knowledge is available, which limits the prediction performance of most supervised methods.
Results: We present a Bayesian semi-supervised learning method, called BGEN, that improves upon supervised and unsupervised methods by both capturing relevant expression profiles and using prior biological knowledge from literature and experimental validation. Unlike currently available semi-supervised learning methods, this new method trains a kernel classifier based on labeled and unlabeled gene expression examples. The semi-supervised trained classifier can then be used to efficiently classify the remaining genes in the dataset. Moreover, we model the confidence of microarray probes and probabilistically combine multiple probe predictions into gene predictions. We apply BGEN to identify genes involved in the development of a specific cell lineage in the C. elegans embryo, and to further identify the tissues in which these genes are enriched. Compared to K-means clustering and SVM classification, BGEN achieves higher sensitivity and specificity. We confirm certain predictions by biological experiments.
Availability: The results are available at http://www.csail.mit.edu/~alanqi/projects/BGEN.html.
Similar articles
-
Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo.BMC Bioinformatics. 2010 Feb 11;11:84. doi: 10.1186/1471-2105-11-84. BMC Bioinformatics. 2010. PMID: 20146825 Free PMC article.
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Kernel hierarchical gene clustering from microarray expression data.Bioinformatics. 2003 Nov 1;19(16):2097-104. doi: 10.1093/bioinformatics/btg288. Bioinformatics. 2003. PMID: 14594715
-
Functional exploration of the C. elegans genome using DNA microarrays.Nat Genet. 2002 Dec;32 Suppl:541-6. doi: 10.1038/ng1039. Nat Genet. 2002. PMID: 12454651 Review.
-
Large-scale gene expression pattern analysis, in situ, in Caenorhabditis elegans.Brief Funct Genomic Proteomic. 2008 May;7(3):175-83. doi: 10.1093/bfgp/eln013. Epub 2008 Mar 9. Brief Funct Genomic Proteomic. 2008. PMID: 18332038 Review.
Cited by
-
Information flow analysis of interactome networks.PLoS Comput Biol. 2009 Apr;5(4):e1000350. doi: 10.1371/journal.pcbi.1000350. Epub 2009 Apr 10. PLoS Comput Biol. 2009. PMID: 19503817 Free PMC article.
-
Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7. BMC Genomics. 2008. PMID: 18831798 Free PMC article.
-
Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.PLoS Comput Biol. 2014 May 1;10(5):e1003592. doi: 10.1371/journal.pcbi.1003592. eCollection 2014 May. PLoS Comput Biol. 2014. PMID: 24784581 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources