Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 22:2021:9436582.
doi: 10.1155/2021/9436582. eCollection 2021.

An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data

Affiliations

An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data

Hongwei Sun et al. Comput Math Methods Med. .

Abstract

High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interests.

Figures

Figure 1
Figure 1
Results of MTL-EN, enetLTS, and Ensemble when n = 500 and p = 1000. Sn: sensitivity; FPR: False Positive Rate; PSR: Positive Selection Rate; FDR: False Discovery Rate.
Algorithm 1
Algorithm 1
Description of C-step algorithm.
Algorithm 2
Algorithm 2
Description of AR-Cstep algorithm.

Similar articles

References

    1. Lopes M. B., Verissimo A., Carrasquinha E., Casimiro S., Beerenwinkel N., Vinga S. Ensemble outlier detection and gene selection in triple-negative breast cancer data. BMC Bioinformatics . 2018;19(1):p. 168. doi: 10.1186/s12859-018-2149-7. - DOI - PMC - PubMed
    1. Wu C., Ma S. A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics . 2015;16(5):873–883. doi: 10.1093/bib/bbu046. - DOI - PMC - PubMed
    1. Ayers K. L., Cordell H. J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genetic Epidemiology . 2010;34(8):879–891. doi: 10.1002/gepi.20543. - DOI - PMC - PubMed
    1. Sun H., Wang S. Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics . 2012;28(10):1368–1375. doi: 10.1093/bioinformatics/bts145. - DOI - PMC - PubMed
    1. Rousseeuw P. J. Least median of squares regression. Journal of the American Statistical Association . 1984;79(388):871–880. doi: 10.1080/01621459.1984.10477105. - DOI