Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 23:12:621317.
doi: 10.3389/fgene.2021.621317. eCollection 2021.

Joint Lp-Norm and L2,1-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery

Affiliations

Joint Lp-Norm and L2,1-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery

Xiang-Zhen Kong et al. Front Genet. .

Abstract

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L2,1-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < p < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L2,1-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.

Keywords: L2, 1-norm; Lp-norm; gene network modules; graph regularization; principal component analysis; sparse constraint; tumor clustering.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The general schematic framework of the PL21GPCA methodology.
FIGURE 2
FIGURE 2
The average performance taking the essential parameter at nine different values from 0.1 to 0.9. (A) The mean value of ACC for different cancer datasets. (B) The mean value of NMI for different cancer datasets.
FIGURE 3
FIGURE 3
A visualized comparison of low-dimensional embeddings by SPCA, gLPCA, pgLPCA, and PL21GPCA on COAD and H-C-P datasets. (A–D) Are the results of the compared methods SPCA, glPCA, gpLPCA, and PL21GPCA respectively on the CRC dataset. (E–H) Are the compared results of the four methods on the H-C-P dataset.
FIGURE 4
FIGURE 4
The first three modules of the constructed network based on the CRC data. The five marked genes SPARC, ABCC12, COL6A3, LUM, and RPS3 have been confirmed to be associated with CRC and other cancers. (A) Module 1; (B) Module 2; (C) Module 3.
FIGURE 5
FIGURE 5
The first three modules of the constructed network based on the H_C_P data. The five marked genes RPL32, EEF1G, SPRR1B, COL1A2, and MMP2 have been confirmed to be associated with multiple cancers. (A) Module 1; (B) Module 2; (C) Module 3.

Similar articles

Cited by

References

    1. Arajo R. F., Lira G. A., Vilaa J. A., Guedes H. G., Leito M. C. A., Lucena H. F., et al. (2015). Prognostic and diagnostic implications of MMP-2, MMP-9, and VEGF-a expressions in colorectal cancer. Pathol. Res. Practice 211 71–77. 10.1016/j.prp.2014.09.007 - DOI - PubMed
    1. Belkin M., Niyogi P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 14 585–591.
    1. Belkin M., Niyogi P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396. 10.1162/089976603321780317 - DOI
    1. Bera T. K., Iavarone C., Kumar V., Lee S., Lee B., Pastan I. (2002). MRP9, an unusual truncated member of the ABC transporter superfamily, is highly expressed in breast cancer. Proc. Natl. Acad. Sci. U.S.A. 99 6997–7002. 10.1073/pnas.102187299 - DOI - PMC - PubMed
    1. Bertsekas D. P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Cambridge, MA: Academic Press.

LinkOut - more resources