Detecting outlier samples in microarray data
- PMID: 19222380
- DOI: 10.2202/1544-6115.1426
Detecting outlier samples in microarray data
Abstract
In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method.
Similar articles
-
Identification of differentially expressed genes with multivariate outlier analysis.J Biopharm Stat. 2004 Aug;14(3):629-46. doi: 10.1081/BIP-200025654. J Biopharm Stat. 2004. PMID: 15468756
-
A new classification model with simple decision rule for discovering optimal feature gene pairs.Comput Biol Med. 2007 Nov;37(11):1637-46. doi: 10.1016/j.compbiomed.2007.03.004. Epub 2007 May 7. Comput Biol Med. 2007. PMID: 17482157
-
Robust PCA and classification in biosciences.Bioinformatics. 2004 Jul 22;20(11):1728-36. doi: 10.1093/bioinformatics/bth158. Epub 2004 Feb 26. Bioinformatics. 2004. PMID: 14988110
-
Univariate Outliers: A Conceptual Overview for the Nurse Researcher.Can J Nurs Res. 2019 Mar;51(1):31-37. doi: 10.1177/0844562118786647. Epub 2018 Jul 3. Can J Nurs Res. 2019. PMID: 29969044 Review.
-
Multivariate Outliers: A Conceptual and Practical Overview for the Nurse and Health Researcher.Can J Nurs Res. 2021 Sep;53(3):316-321. doi: 10.1177/0844562120932054. Epub 2020 Jun 10. Can J Nurs Res. 2021. PMID: 32522115 Review.
Cited by
-
An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.PLoS One. 2012;7(10):e46700. doi: 10.1371/journal.pone.0046700. Epub 2012 Oct 17. PLoS One. 2012. PMID: 23082127 Free PMC article.
-
Major Histocompatibility Complex Genes as Therapeutic Opportunity for Immune Cold Molecular Cancer Subtypes.J Immunol Res. 2020 Nov 17;2020:8758090. doi: 10.1155/2020/8758090. eCollection 2020. J Immunol Res. 2020. PMID: 33282963 Free PMC article. Review.
-
Thyroid hormone-regulated gene expression in juvenile mouse liver: identification of thyroid response elements using microarray profiling and in silico analyses.BMC Genomics. 2011 Dec 29;12:634. doi: 10.1186/1471-2164-12-634. BMC Genomics. 2011. PMID: 22206413 Free PMC article.
-
Gene Expression Profiling of Ewing Sarcoma Tumors Reveals the Prognostic Importance of Tumor-Stromal Interactions: A Report from the Children's Oncology Group.J Pathol Clin Res. 2015 Apr;1(2):83-94. doi: 10.1002/cjp2.9. J Pathol Clin Res. 2015. PMID: 26052443 Free PMC article.
-
Transcriptomic responses in the oral cavity of F344 rats and B6C3F1 mice following exposure to Cr(VI): Implications for risk assessment.Environ Mol Mutagen. 2016 Dec;57(9):706-716. doi: 10.1002/em.22064. Epub 2016 Nov 15. Environ Mol Mutagen. 2016. PMID: 27859739 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources