Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 19;15(2):e1006772.
doi: 10.1371/journal.pcbi.1006772. eCollection 2019 Feb.

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

Affiliations

A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

He Peng et al. PLoS Comput Biol. .

Abstract

Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://github.com/ChengF-Lab/COAC.

PubMed Disclaimer

Conflict of interest statement

The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government. The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Diagram illustrating a Components Overlapping Attribute Clustering (COAC) algorithm for inferring gene-gene relationships from scRNA-seq data.
(A) The whole gene co-expression network is decomposed into gene clusters (subnetworks). Each subnetwork is used to evaluate which degree of genes in the co-expression matrix derived from scRNA-seq data. If several genes express abnormally, the value of the subnetwork which contains those genes will change significantly. (B) The scRNA-seq data was decomposed into individual gene expression profile with specific components. After gene selection from each gene expression profile, the largest connected component was obtained as the subnetwork (see Methods).
Fig 2
Fig 2. Batch effect elimination by COAC evaluated by a t-SNE algorithm [9].
(A) A significant batch effect elimination (Cells distribute separately in different groups) based on the COAC-inferred subnetworks. (B) A significant batch effect (Cells distribute uniformly between case and control groups) was observed based on the original scRNA-seq data from a previous study [37], without applying COAC.
Fig 3
Fig 3. Accurate cell type identification by COAC.
(A) The IFN-β stimulated and control groups are separated based on the subnetworks identified by COAC. (B) Cells from IFN-β stimulated and control groups are uniformly distributed in the whole space without applying COAC. (C) Five different cell types are identified with high accuracy based on gene subnetworks identified by COAC. Cell types for 83.05% cells have been identified correctly based on well-defined cell types from experimental data. Cell types are visualized by a t-SNE algorithm [9]. Endo: endothelial cells, PT: proximal tubule cells, DCT: distal convoluted tubule cells, CD: collecting duct principal cells, lymph: lymphocyte cells.
Fig 4
Fig 4. Survival analysis for COAC-inferred gene co-expression subnetworks in melanoma.
(A and B) Survival analysis for COAC-inferred gene co-expression subnetworks from scRNA-seq data [11] by comparing malignant cells versus control cells from individual melanoma patients (see Methods). (C to F) Survival analysis for COAC-predicted gene subnetworks from scRNA-seq data by comparing T cells versus controls cells extracted from individual melanoma patients [11]. The top significantly selected subnetwork for each survival analysis was highlighted in each subfigure. The bulk RNA-seq data and clinical profiles for each melanoma patients were collected from TCGA website [13]. Survival analysis was conducted for these two groups using the R survival package [36] (see Methods).
Fig 5
Fig 5. Cancer pharmacogenomics validation for COAC-predicted gene subnetworks.
(A to F) The receiver operating characteristic (ROC) curves for six selected drugs: SNX-2112 (a selective Hsp90 inhibitor), BX-912 (a PDK1 inhibitor), Bleomycin (induction of DNA strand breaks), PHA-793887 (a pan-CDK inhibitor), PI-103 (a PI3K and mTOR inhibitor), and WZ3105 (also named GSK-2126458 or Omipalisib, a PI3K inhibitor). Drug IC50 values were predicted based on SVM regression models built by utilizing the COAC-inferred gene subnetworks as feature vectors (see Methods). The area under ROC curves (AUC) during 10-fold cross-validations were shown. In each ROC plot, the cutoff values at the corresponding curve positions are represented by the color keys. (G and H) Two COAC-inferred gene co-expression subnetworks for two selected drug targets on SNX-2112 (G) and BX-912 (H). The color key of each node indicates the weight of the genes in each subnetwork.

Similar articles

Cited by

References

    1. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009; 6(5):377 10.1038/nmeth.1315 - DOI - PubMed
    1. Wang Y, Navin NE. Advances and applications of single-cell sequencing technologies. Mol Cell. 2015; 58(4):598–609. 10.1016/j.molcel.2015.05.005 - DOI - PMC - PubMed
    1. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, et al. The Human Cell Atlas. Elife. 2017; 6: e27041 10.7554/eLife.27041 - DOI - PMC - PubMed
    1. Ståhlberg A, Rusnakova V, Kubista M. The added value of single-cell gene expression profiling. Briefi Funct Genomics. 2013; 12(2):81–9. 10.1093/bfgp/elt001 - DOI - PubMed
    1. Cheng F, Liang H, Butte AJ, Eng C, Nussinov R. Personal mutanomes meet modern oncology drug discovery and precision health. Pharmacol Rev. 2019; 71(1):1–19. 10.1124/pr.118.016253 - DOI - PMC - PubMed

Publication types

Substances