LSimpute: accurate estimation of missing values in microarray data with least squares methods
- PMID: 14978222
- PMCID: PMC374359
- DOI: 10.1093/nar/gnh026
LSimpute: accurate estimation of missing values in microarray data with least squares methods
Abstract
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.
Figures




Similar articles
-
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24. Bioinformatics. 2005. PMID: 15731210
-
An Iterative Locally Auto-Weighted Least Squares Method for Microarray Missing Value Estimation.IEEE Trans Nanobioscience. 2017 Jan;16(1):21-33. doi: 10.1109/TNB.2016.2636243. Epub 2016 Dec 6. IEEE Trans Nanobioscience. 2017. PMID: 28114029
-
Towards clustering of incomplete microarray data without the use of imputation.Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31. Bioinformatics. 2007. PMID: 17077099
-
Missing value estimation methods for DNA microarrays.Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520. Bioinformatics. 2001. PMID: 11395428
-
Estimation and control of multiple testing error rates for microarray studies.Brief Bioinform. 2006 Mar;7(1):25-36. doi: 10.1093/bib/bbk002. Brief Bioinform. 2006. PMID: 16761362 Review.
Cited by
-
Gene network inference and visualization tools for biologists: application to new human transcriptome datasets.Nucleic Acids Res. 2012 Mar;40(6):2377-98. doi: 10.1093/nar/gkr902. Epub 2011 Nov 24. Nucleic Acids Res. 2012. PMID: 22121215 Free PMC article.
-
Improving missing value imputation of microarray data by using spot quality weights.BMC Bioinformatics. 2006 Jun 16;7:306. doi: 10.1186/1471-2105-7-306. BMC Bioinformatics. 2006. PMID: 16780582 Free PMC article.
-
Quality determination and the repair of poor quality spots in array experiments.BMC Bioinformatics. 2005 Sep 26;6:234. doi: 10.1186/1471-2105-6-234. BMC Bioinformatics. 2005. PMID: 16185360 Free PMC article.
-
Modulation of transcriptional and inflammatory responses in murine macrophages by the Mycobacterium tuberculosis mammalian cell entry (Mce) 1 complex.PLoS One. 2011;6(10):e26295. doi: 10.1371/journal.pone.0026295. Epub 2011 Oct 24. PLoS One. 2011. PMID: 22039457 Free PMC article.
-
Gene expression patterns related to vascular invasion and aggressive features in endometrial cancer.Am J Pathol. 2011 Feb;178(2):861-71. doi: 10.1016/j.ajpath.2010.10.040. Am J Pathol. 2011. PMID: 21281818 Free PMC article.
References
-
- Perou C.M., Sørlie,T., Eisen,M.B., van de Rijn,M., Jeffrey,S.S., Rees,C.A., Pollack,J.R., Ross,D.T., Johnsen,H., Akslen,L.A. et al. (2000) Molecular portraits of human breast tumors. Nature, 406, 747–752. - PubMed
-
- Alizadeh A.A., Eisen,M.B., Davis,R.E., Ma,C., Lossos,I.S., Rosenwald,A., Boldrick,J.C., Sabet,H., Tran, T, Powell,J.L. et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene-expression profiling. Nature, 403, 503–511. - PubMed
-
- Golub T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeeck,M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., Caligiuri,M.A. et al. (1999) Molecular classification of cancer: class discovery and class prediction by expression monitoring. Science, 286, 531–537. - PubMed
-
- Chu S., DeRisi,J., Eisen,M.B., Mulholland,J., Botstein,D., Brown,P.O. and Hesrkowitz,I. (1998) The transcriptional program of sporulation in budding yeast. Science, 278, 680–686. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials