. 2019 Feb 19;15(2):e1006772.

doi: 10.1371/journal.pcbi.1006772. eCollection 2019 Feb.

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

He Peng¹, Xiangxiang Zeng¹, Yadi Zhou², Defu Zhang¹, Ruth Nussinov^{3

4}, Feixiong Cheng^{5

6

7}

Affiliations

¹ Department of Computer Science, Xiamen University, Xiamen, Fujian, China.
² Department of Chemistry and Biochemistry, Ohio University, Athens, OH, United States of America.
³ Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD, United States of America.
⁴ Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁵ Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, United States of America.
⁶ Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, United States of America.
⁷ Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, United States of America.

PMID: 30779739
PMCID: PMC6396937
DOI: 10.1371/journal.pcbi.1006772

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

He Peng et al. PLoS Comput Biol. 2019.

. 2019 Feb 19;15(2):e1006772.

doi: 10.1371/journal.pcbi.1006772. eCollection 2019 Feb.

Authors

He Peng¹, Xiangxiang Zeng¹, Yadi Zhou², Defu Zhang¹, Ruth Nussinov^{3

4}, Feixiong Cheng^{5

6

7}

Affiliations

¹ Department of Computer Science, Xiamen University, Xiamen, Fujian, China.
² Department of Chemistry and Biochemistry, Ohio University, Athens, OH, United States of America.
³ Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD, United States of America.
⁴ Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁵ Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, United States of America.
⁶ Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, United States of America.
⁷ Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, United States of America.

PMID: 30779739
PMCID: PMC6396937
DOI: 10.1371/journal.pcbi.1006772

Abstract

Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://github.com/ChengF-Lab/COAC.

PubMed Disclaimer

Conflict of interest statement

The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government. The authors have declared that no competing interests exist.

Figures

**Fig 1. Diagram illustrating a Components Overlapping Attribute Clustering (COAC) algorithm for inferring gene-gene relationships from scRNA-seq data.**
(A) The whole gene co-expression network is decomposed into gene clusters (subnetworks). Each subnetwork is used to evaluate which degree of genes in the co-expression matrix derived from scRNA-seq data. If several genes express abnormally, the value of the subnetwork which contains those genes will change significantly. (B) The scRNA-seq data was decomposed into individual gene expression profile with specific components. After gene selection from each gene expression profile, the largest connected component was obtained as the subnetwork (see Methods).

**Fig 2. Batch effect elimination by COAC evaluated by a t-SNE algorithm [9].**
(A) A significant batch effect elimination (Cells distribute separately in different groups) based on the COAC-inferred subnetworks. (B) A significant batch effect (Cells distribute uniformly between case and control groups) was observed based on the original scRNA-seq data from a previous study [37], without applying COAC.

**Fig 3. Accurate cell type identification by COAC.**
(A) The IFN-β stimulated and control groups are separated based on the subnetworks identified by COAC. (B) Cells from IFN-β stimulated and control groups are uniformly distributed in the whole space without applying COAC. (C) Five different cell types are identified with high accuracy based on gene subnetworks identified by COAC. Cell types for 83.05% cells have been identified correctly based on well-defined cell types from experimental data. Cell types are visualized by a t-SNE algorithm [9]. Endo: endothelial cells, PT: proximal tubule cells, DCT: distal convoluted tubule cells, CD: collecting duct principal cells, lymph: lymphocyte cells.

**Fig 4. Survival analysis for COAC-inferred gene co-expression subnetworks in melanoma.**
(A and B) Survival analysis for COAC-inferred gene co-expression subnetworks from scRNA-seq data [11] by comparing malignant cells versus control cells from individual melanoma patients (see Methods). (C to F) Survival analysis for COAC-predicted gene subnetworks from scRNA-seq data by comparing T cells versus controls cells extracted from individual melanoma patients [11]. The top significantly selected subnetwork for each survival analysis was highlighted in each subfigure. The bulk RNA-seq data and clinical profiles for each melanoma patients were collected from TCGA website [13]. Survival analysis was conducted for these two groups using the R survival package [36] (see Methods).

**Fig 5. Cancer pharmacogenomics validation for COAC-predicted gene subnetworks.**
(A to F) The receiver operating characteristic (ROC) curves for six selected drugs: SNX-2112 (a selective Hsp90 inhibitor), BX-912 (a PDK1 inhibitor), Bleomycin (induction of DNA strand breaks), PHA-793887 (a pan-CDK inhibitor), PI-103 (a PI3K and mTOR inhibitor), and WZ3105 (also named GSK-2126458 or Omipalisib, a PI3K inhibitor). Drug IC₅₀ values were predicted based on SVM regression models built by utilizing the COAC-inferred gene subnetworks as feature vectors (see Methods). The area under ROC curves (AUC) during 10-fold cross-validations were shown. In each ROC plot, the cutoff values at the corresponding curve positions are represented by the color keys. (G and H) Two COAC-inferred gene co-expression subnetworks for two selected drug targets on SNX-2112 (G) and BX-912 (H). The color key of each node indicates the weight of the genes in each subnetwork.

See this image and copyright information in PMC

Cited by

Prioritizing Autism Risk Genes using Personalized Graphical Models Estimated from Single Cell RNA-seq Data.
Liu J, Wang H, Sun W, Liu Y. Liu J, et al. J Am Stat Assoc. 2022;117(537):38-51. doi: 10.1080/01621459.2021.1933495. Epub 2021 Jul 21. J Am Stat Assoc. 2022. PMID: 35529781 Free PMC article.
Single-Cell Techniques and Deep Learning in Predicting Drug Response.
Wu Z, Lawrence PJ, Ma A, Zhu J, Xu D, Ma Q. Wu Z, et al. Trends Pharmacol Sci. 2020 Dec;41(12):1050-1065. doi: 10.1016/j.tips.2020.10.004. Epub 2020 Nov 2. Trends Pharmacol Sci. 2020. PMID: 33153777 Free PMC article. Review.
Single-cell RNA-seq clustering: datasets, models, and algorithms.
Peng L, Tian X, Tian G, Xu J, Huang X, Weng Y, Yang J, Zhou L. Peng L, et al. RNA Biol. 2020 Jun;17(6):765-783. doi: 10.1080/15476286.2020.1728961. Epub 2020 Mar 1. RNA Biol. 2020. PMID: 32116127 Free PMC article.
Pharmacogenomic Analysis of Combined Therapies against Glioblastoma Based on Cell Markers from Single-Cell Sequencing.
Liu J, Wu R, Yuan S, Kelleher R, Chen S, Chen R, Zhang T, Obaidi I, Sheridan H. Liu J, et al. Pharmaceuticals (Basel). 2023 Oct 30;16(11):1533. doi: 10.3390/ph16111533. Pharmaceuticals (Basel). 2023. PMID: 38004399 Free PMC article.
How can same-gene mutations promote both cancer and developmental disorders?
Nussinov R, Tsai CJ, Jang H. Nussinov R, et al. Sci Adv. 2022 Jan 14;8(2):eabm2059. doi: 10.1126/sciadv.abm2059. Epub 2022 Jan 14. Sci Adv. 2022. PMID: 35030014 Free PMC article.

References

1. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009; 6(5):377 10.1038/nmeth.1315 - DOI - PubMed
1. Wang Y, Navin NE. Advances and applications of single-cell sequencing technologies. Mol Cell. 2015; 58(4):598–609. 10.1016/j.molcel.2015.05.005 - DOI - PMC - PubMed
1. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, et al. The Human Cell Atlas. Elife. 2017; 6: e27041 10.7554/eLife.27041 - DOI - PMC - PubMed
1. Ståhlberg A, Rusnakova V, Kubista M. The added value of single-cell gene expression profiling. Briefi Funct Genomics. 2013; 12(2):81–9. 10.1093/bfgp/elt001 - DOI - PubMed
1. Cheng F, Liang H, Butte AJ, Eng C, Nussinov R. Personal mutanomes meet modern oncology drug discovery and precision health. Pharmacol Rev. 2019; 71(1):1–19. 10.1124/pr.118.016253 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

Affiliations

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous