Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan;34(1):e3182.
doi: 10.1002/cem.3182. Epub 2019 Dec 2.

Cellwise outlier detection and biomarker identification in metabolomics based on pairwise log ratios

Affiliations

Cellwise outlier detection and biomarker identification in metabolomics based on pairwise log ratios

Jan Walach et al. J Chemom. 2020 Jan.

Abstract

Data outliers can carry very valuable information and might be most informative for the interpretation. Nevertheless, they are often neglected. An algorithm called cellwise outlier diagnostics using robust pairwise log ratios (cell-rPLR) for the identification of outliers in single cell of a data matrix is proposed. The algorithm is designed for metabolomic data, where due to the size effect, the measured values are not directly comparable. Pairwise log ratios between the variable values form the elemental information for the algorithm, and the aggregation of appropriate outlyingness values results in outlyingness information. A further feature of cell-rPLR is that it is useful for biomarker identification, particularly in the presence of cellwise outliers. Real data examples and simulation studies underline the good performance of this algorithm in comparison with alternative methods.

Keywords: biomarker; cellwise outliers; cell‐rPLR; log ratio; metabolomics; robust method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Difference between rowwise (left) and cellwise (right) outliers of a data matrix
Figure 2
Figure 2
Original (left) and adjusted (right) outlyingness functions
Figure 3
Figure 3
Outlier diagnostics for the IMD data, using the adjusted Tukey biweight function
Figure 4
Figure 4
Outlier diagnostics for the IMD data, using the adjusted Hampel function
Figure 5
Figure 5
A,B, Receiver operating characteristic (ROC) curve for the identification of cellwise outliers of the two algorithms: detect deviating cells (DDC) (red) and cellwise outlier diagnostics using robust pairwise log ratios (cell‐rPLR) (blue)
Figure 6
Figure 6
A‐D, Average ranks of the methods for the identification of the four known biomarkers in the PKU data, in a simulation setting with increasing amount of contamination
Figure 7
Figure 7
A‐E, Average ranks of the methods for the identification of the five known biomarkers in the MTBL59 data, in a simulation setting with increasing amount of contamination
Figure 8
Figure 8
A, B, Performance of the methods for their ability in biomarker identification for the MTBL59 data set, for different levels of cellwise contamination

References

    1. Strimbu K, Tavel JA. What are biomarkers? Curr Opin HIV AIDS. 2010;5(6):463. - PMC - PubMed
    1. Pepe MS, Etzioni R, Feng Z, et al. Phases of biomarker development for early detection of cancer. JNCI: J Natl Cancer Inst. 2001;93(14):1054‐1061. - PubMed
    1. Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2009;26(3):392‐398. - PubMed
    1. Huber PJ, Ronchetti EM. Robust Statistics, Series in Probability and Mathematical Statistics. New York, NY, USA: John Wiley; 1981.
    1. Maronna RA, Martin RD, Yohai VJ, Salibián‐Barrera M. Robust Statistics: Theory and Methods (With R). Chichester, UK: Wiley; 2019.

LinkOut - more resources