RaPiDS: an algorithm for rapid expression profile database search
- PMID: 17503380
RaPiDS: an algorithm for rapid expression profile database search
Abstract
In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables. RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1,685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.
Similar articles
-
GeneMCL in microarray analysis.Comput Biol Chem. 2005 Oct;29(5):354-9. doi: 10.1016/j.compbiolchem.2005.07.002. Epub 2005 Sep 19. Comput Biol Chem. 2005. PMID: 16172020
-
List of lists-annotated (LOLA): a database for annotation and comparison of published microarray gene lists.Gene. 2005 Oct 24;360(1):78-82. doi: 10.1016/j.gene.2005.07.008. Epub 2005 Sep 2. Gene. 2005. PMID: 16140476 Review.
-
LyM: a tool to reach the best factor in gene expression comparison.In Silico Biol. 2007;7(1):101-4. In Silico Biol. 2007. PMID: 17688434
-
Exploring the functional landscape of gene expression: directed search of large microarray compendia.Bioinformatics. 2007 Oct 15;23(20):2692-9. doi: 10.1093/bioinformatics/btm403. Epub 2007 Aug 27. Bioinformatics. 2007. PMID: 17724061
-
Microarray platforms: introduction and application to neurobiology.Int Rev Neurobiol. 2004;60:1-23. doi: 10.1016/S0074-7742(04)60001-8. Int Rev Neurobiol. 2004. PMID: 15474585 Review. No abstract available.
Cited by
-
Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6823-8. doi: 10.1073/pnas.0912043107. Epub 2010 Apr 1. Proc Natl Acad Sci U S A. 2010. PMID: 20360561 Free PMC article.
-
Improving gene expression similarity measurement using pathway-based analytic dimension.BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S15. doi: 10.1186/1471-2164-10-S3-S15. BMC Genomics. 2009. PMID: 19958478 Free PMC article.
-
Retrieving relevant time-course experiments: a study on Arabidopsis microarrays.IET Syst Biol. 2016 Jun;10(3):87-93. doi: 10.1049/iet-syb.2015.0042. IET Syst Biol. 2016. PMID: 27187987 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Other Literature Sources
Miscellaneous