An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
- PMID: 34976114
- PMCID: PMC8716222
- DOI: 10.1155/2021/9436582
An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
Abstract
High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters.
Copyright © 2021 Hongwei Sun et al.
Conflict of interest statement
The authors declare that they have no conflicts of interests.
Figures
Similar articles
-
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9. BMC Bioinformatics. 2020. PMID: 32795265 Free PMC article.
-
Identification of influential observations in high-dimensional survival data through robust penalized Cox regression based on trimming.Math Biosci Eng. 2023 Jan 11;20(3):5352-5378. doi: 10.3934/mbe.2023248. Math Biosci Eng. 2023. PMID: 36896549
-
Ensemble outlier detection and gene selection in triple-negative breast cancer data.BMC Bioinformatics. 2018 May 4;19(1):168. doi: 10.1186/s12859-018-2149-7. BMC Bioinformatics. 2018. PMID: 29728051 Free PMC article.
-
High-throughput «Omics» technologies: New tools for the study of triple-negative breast cancer.Cancer Lett. 2016 Nov 1;382(1):77-85. doi: 10.1016/j.canlet.2016.03.001. Epub 2016 Mar 7. Cancer Lett. 2016. PMID: 26965997 Review.
-
Androgen receptor, EGFR, and BRCA1 as biomarkers in triple-negative breast cancer: a meta-analysis.Biomed Res Int. 2015;2015:357485. doi: 10.1155/2015/357485. Epub 2015 Jan 28. Biomed Res Int. 2015. PMID: 25695063 Free PMC article. Review.
References
-
- Rousseeuw P. J. Least median of squares regression. Journal of the American Statistical Association . 1984;79(388):871–880. doi: 10.1080/01621459.1984.10477105. - DOI
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials