Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 2;17(6):e1009064.
doi: 10.1371/journal.pcbi.1009064. eCollection 2021 Jun.

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data

Affiliations

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data

Pengcheng Zeng et al. PLoS Comput Biol. .

Abstract

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Toy example of coupleCoC+.
(a). Source data is represented by “S”. Based on whether the features are linked with those in the source data, we partition the target data into two parts, “T” and “U”. The features in data T are linked with data S, while the features in data U are not directly linked with data S. The cells in data T and U are the same. Red color means that the corresponding features are active, and yellow color means that they are inactive. (b). The clustering results by coupleCoC+. coupleCoC+ co-clusters the data S, T and U simultaneously by clustering similar cells and similar features. A subset of the cell clusters are also matched between the source data and the target data, representing shared cell types. “clu” is the abbreviation of “cluster”, and “m” means the matched clusters. “clu t” represents the cell cluster that is unique to the the target data.
Fig 2
Fig 2. Heatmaps of the clustering results by coupleCoC+ for example 1.
“clu m” represents the matched cell cluster across the source data and the target data. “clu s” and “clu t” represent the cell clusters that are unique to the source data and the target data, respectively. For better visualization, we randomly averaged every 15 cells within the same cell cluster to generate pseudocells for every heatmap.
Fig 3
Fig 3. Heatmaps of the clustering results by coupleCoC+ for example 2.
“clu m” represents the matched cell cluster across the source data and the target data. “clu t” represents the cell cluster that is unique to the target data. For better visualization, we randomly averaged every 15 cells within the same cell cluster to generate pseudocells for every heatmap.
Fig 4
Fig 4. Heatmaps of the clustering results by coupleCoC+ for example 3.
“clu m” represents the matched cell cluster across the source data and the target data. We obtained the centered methylation level by first centering the data matrix by row and then centering the data matrix by column. Grey color in the heatmap of sc-methylation data corresponds to missing data. For better visualization, we randomly averaged every 15 cells within the same cell cluster to generate pseudocells for every heatmap.
Fig 5
Fig 5. Heatmaps of the clustering results by coupleCoC+ for example 4.
“clu m” represents the matched cell cluster across the source data and the target data. “clu s” and “clu t” represent the cell clusters that are unique to the source data and the target data, respectively. For better visualization, we randomly averaged every 15 cells within the same cell cluster to generate pseudocells for every heatmap.

Similar articles

Cited by

References

    1. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al.. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523:486–90. doi: 10.1038/nature14590 - DOI - PMC - PubMed
    1. Mezger A, Klemm S, Mann I, Brower K, Mir A, Bostick M, et al.. High-throughout chromatin accessibility profiling at single-cell resolution. Nat Commun. 2018;9:34–67. doi: 10.1038/s41467-018-05887-x - DOI - PMC - PubMed
    1. Macaulay IC, Ponting CP and Voet T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 2017;33:115–68. - PMC - PubMed
    1. Guo H, Zhu P, Wu X, Li X, Wen L and Tang F. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res. 2013;23:2126–35. doi: 10.1101/gr.161679.113 - DOI - PMC - PubMed
    1. Smallwood SA, Lee HJ, Angermueller C, Krueger F, Saadeh H, Peat J, et al.. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods. 2014;11:817–20. doi: 10.1038/nmeth.3035 - DOI - PMC - PubMed

Publication types

Substances