Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 23;25(6):bbae483.
doi: 10.1093/bib/bbae483.

scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data

Affiliations

scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data

Dayu Hu et al. Brief Bioinform. .

Abstract

In recent years, there has been significant advancement in the field of single-cell data analysis, particularly in the development of clustering methods. Despite these advancements, most algorithms continue to focus primarily on analyzing the provided single-cell matrix data. However, within medical contexts, single-cell data often encompasses a wealth of exogenous information, such as gene networks. Overlooking this aspect could result in information loss and produce clustering outcomes lacking significant clinical relevance. To address this limitation, we introduce an innovative deep clustering method for single-cell data that leverages exogenous gene information to generate discriminative cell representations. Specifically, an attention-enhanced graph autoencoder has been developed to efficiently capture topological signal patterns among cells. Concurrently, a random walk on an exogenous protein-protein interaction network enabled the acquisition of the gene's embeddings. Ultimately, the clustering process entailed integrating and reconstructing gene-cell cooperative embeddings, which yielded a discriminative representation. Extensive experiments have demonstrated the effectiveness of the proposed method. This research provides enhanced insights into the characteristics of cells, thus laying the foundation for the early diagnosis and treatment of diseases. The datasets and code can be publicly accessed in the repository at https://github.com/DayuHuu/scEGG.

Keywords: Node2vec; clustering; deep learning; exogenous gene information; protein-protein interaction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cells and genes both exhibit associative relationships. The left illustrates the connections between cells, while the right depicts the associations among genes. The cell image originates from the SciDraw database.
Figure 2
Figure 2
In BFS and DFS traversals, the node pointed to by the top line is considered a low-order neighbor of the source node formula image, while the node pointed to by the bottom line is considered a higher-order neighbor.
Figure 3
Figure 3
The scEGG model framework is divided into two stages. In stage 1, a random walk algorithm is applied to an exogenous gene network to generate distinct embeddings for each gene. In stage 2, the derived gene embeddings and cell embeddings are mapped to the same feature space, where they are integrated through matrix multiplication to construct a gene-cell cooperative embedding.
Figure 4
Figure 4
The gene selection process.
Figure 5
Figure 5
The two-dimensional t-SNE visualizations of cell embeddings on the Bjorklund dataset, learned under various comparative models.
Figure 6
Figure 6
The objective function was recorded over 500 epochs during the training on six benchmark datasets.
Figure 7
Figure 7
The investigation assesses the impact of exogenous information on the clustering performance across nine models on the Sun dataset.

Similar articles

Cited by

References

    1. Sun D, Wang J, Han Y. et al. .. Tisch: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res 2021;49:D1420–30. 10.1093/nar/gkaa1020. - DOI - PMC - PubMed
    1. Peng L, Wang F, Wang Z. et al. .. Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies. Brief Bioinform 2022;23:bbac234. - PubMed
    1. Zhang Y, Song J, Zhao Z. et al. .. Single-cell transcriptome analysis reveals tumor immune microenvironment heterogenicity and granulocytes enrichment in colorectal cancer liver metastases. Cancer Lett 2020;470:84–94. 10.1016/j.canlet.2019.10.016. - DOI - PubMed
    1. Hu S, Qian M-WS, Wang D-Y. et al. .. Integrating massive RNA-seq data to elucidate transcriptome dynamics in drosophila melanogaster. Brief Bioinform 2023;24:bbad177. - PMC - PubMed
    1. Quan H, Li X, Hu D. et al. .. Dual-channel prototype network for few-shot pathology image classification. IEEE J Biomed Health Inform 2024;28:4132–44. 10.1109/JBHI.2024.3386197. - DOI - PubMed