EnsMOD: A Software Program for Omics Sample Outlier Detection
- PMID: 37042708
- PMCID: PMC10282819
- DOI: 10.1089/cmb.2022.0243
EnsMOD: A Software Program for Omics Sample Outlier Detection
Abstract
Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare biological states. Two recent publications describe robust algorithms for detecting transcriptomic sample outliers, but neither algorithm had been incorporated into a software tool for scientists. Here we describe Ensemble Methods for Outlier Detection (EnsMOD) which incorporates both algorithms. EnsMOD calculates how closely the quantitation variation follows a normal distribution, plots the density curves of each sample to visualize anomalies, performs hierarchical cluster analyses to calculate how closely the samples cluster with each other, and performs robust principal component analyses to statistically test if any sample is an outlier. The probabilistic threshold parameters can be easily adjusted to tighten or loosen the outlier detection stringency. EnsMOD can be used to analyze any omics dataset with normally distributed variance. Here it was used to analyze a simulated proteomics dataset, a multiomic (proteome and transcriptome) dataset, a single-cell proteomics dataset, and a phosphoproteomics dataset. EnsMOD successfully identified all of the simulated outliers, and subsequent removal of a detected outlier improved data quality for downstream statistical analyses.
Keywords: hierarchical cluster analysis; multivariate; omics; outlier detection; proteomics; robust principal component analysis.
Conflict of interest statement
The authors declare they have no conflicting financial interests.
Figures


Similar articles
-
Robust principal component analysis for accurate outlier sample detection in RNA-Seq data.BMC Bioinformatics. 2020 Jun 29;21(1):269. doi: 10.1186/s12859-020-03608-0. BMC Bioinformatics. 2020. PMID: 32600248 Free PMC article.
-
STAR_outliers: a python package that separates univariate outliers from non-normal distributions.BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0. BioData Min. 2023. PMID: 37667378 Free PMC article.
-
SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.Brief Bioinform. 2024 Mar 27;25(3):bbae129. doi: 10.1093/bib/bbae129. Brief Bioinform. 2024. PMID: 38557674 Free PMC article.
-
A critical review of machine-learning for "multi-omics" marine metabolite datasets.Comput Biol Med. 2023 Oct;165:107425. doi: 10.1016/j.compbiomed.2023.107425. Epub 2023 Aug 29. Comput Biol Med. 2023. PMID: 37696182 Review.
-
Multiview learning for understanding functional multiomics.PLoS Comput Biol. 2020 Apr 2;16(4):e1007677. doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr. PLoS Comput Biol. 2020. PMID: 32240163 Free PMC article. Review.
Cited by
-
The Associations of Air Pollution Mixture Exposure with Plasma Proteins in an Elderly U.S. Panel.Environ Sci Technol. 2025 Aug 5;59(30):15692-15704. doi: 10.1021/acs.est.5c03052. Epub 2025 Jul 24. Environ Sci Technol. 2025. PMID: 40704984 Free PMC article.
-
Extracellular Vesicle Protein Expression in Doped Bioactive Glasses: Further Insights Applying Anomaly Detection.Int J Mol Sci. 2024 Mar 21;25(6):3560. doi: 10.3390/ijms25063560. Int J Mol Sci. 2024. PMID: 38542533 Free PMC article.
References
-
- Aggarwal CC. Outlier Analysis, 2nd ed. Springer International Publishing AG: Cham, Switzerland; 2017; doi: 10.1007/978-3-319-47578-3 - DOI
-
- Charrad M, Ghazzali N, Boiteau V, et al. . NbClust: An R package for determining the relevant number of clusters in a data set. J Stat Softw 2014;61:1–36; doi: 10.18637/jss.v061.i06 - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources