Improving missing value estimation in microarray data with gene ontology
- PMID: 16377613
- DOI: 10.1093/bioinformatics/btk019
Improving missing value estimation in microarray data with gene ontology
Abstract
Motivation: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation.
Results: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments.
Availability: Java and Matlab codes are available on request from the authors.
Supplementary material: Available online at http://users.utu.fi/jotatu/GOImpute.html.
Similar articles
-
Integrative missing value estimation for microarray data.BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449. BMC Bioinformatics. 2006. PMID: 17038176 Free PMC article.
-
Iterated local least squares microarray missing value imputation.J Bioinform Comput Biol. 2006 Oct;4(5):935-57. doi: 10.1142/s0219720006002302. J Bioinform Comput Biol. 2006. PMID: 17099935
-
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24. Bioinformatics. 2005. PMID: 15731210
-
Missing value estimation for DNA microarray gene expression data: local least squares imputation.Bioinformatics. 2005 Jan 15;21(2):187-98. doi: 10.1093/bioinformatics/bth499. Epub 2004 Aug 27. Bioinformatics. 2005. PMID: 15333461
-
A comprehensive survey on computational learning methods for analysis of gene expression data.Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022. Front Mol Biosci. 2022. PMID: 36458095 Free PMC article. Review.
Cited by
-
A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization.BioData Min. 2012 Mar 26;5:2. doi: 10.1186/1756-0381-5-2. BioData Min. 2012. PMID: 22448851 Free PMC article.
-
Integrative missing value estimation for microarray data.BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449. BMC Bioinformatics. 2006. PMID: 17038176 Free PMC article.
-
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15. BMC Genomics. 2010. PMID: 20056002 Free PMC article.
-
A literature-based similarity metric for biological processes.BMC Bioinformatics. 2006 Jul 26;7:363. doi: 10.1186/1471-2105-7-363. BMC Bioinformatics. 2006. PMID: 16872502 Free PMC article.
-
Biological impact of missing-value imputation on downstream analyses of gene expression profiles.Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2. Bioinformatics. 2011. PMID: 21045072 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases