Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 12;12(1):2177.
doi: 10.1038/s41467-021-22495-4.

RA3 is a reference-guided approach for epigenetic characterization of single cells

Affiliations

RA3 is a reference-guided approach for epigenetic characterization of single cells

Shengquan Chen et al. Nat Commun. .

Abstract

The recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The reference-guided approach for the analysis of scCAS data.
a A graphical illustration of the RA3 model. RA3 decomposes the variation in scCAS data into three components: the component that captures the shared biological variation with reference data, the component that captures the unique biological variation in single-cell data, and the component that captures other variations. b t-SNE visualization of the cells from donor BM0828 using latent features obtained from TF-IDF + PCA. c t-SNE visualization of the cells from donor BM0828 using latent features obtained from bulk projection. d We calculated the residuals after the bulk projection. PCA was performed on the residuals, followed by t-SNE visualization. e The learned second component with the spike-and-slab prior in RA3. f t-SNE visualization using the first two components learned by RA3. TF-IDF term frequency-inverse document frequency transformation, PCA principal component analysis.
Fig. 2
Fig. 2. RA3 incorporates reference data constructed from different sources.
t-SNE visualizations of the cells in the GM/HEK dataset using latent features obtained from a TF-IDF + PCA, and from RA3 using reference data constructed from different samples, including b BAM files of bulk GM12878 and HEK293T DNase-seq samples, c BAM files of all the bulk samples in OPENANNO, and d BED files of all the bulk samples in OPENANNO. e We split the cells in the mouse forebrain dataset into half: half of the cells were used to construct pseudo-bulk reference, and the other half were treated as single-cell data. t-SNE visualization using the latent features learned by RA3 with the complete reference is shown. f We also constructed an incomplete pseudo-bulk reference by leaving out MG and OC cells. t-SNE visualization using the latent features obtained by bulk projection with incomplete reference is shown. We implemented RA3 with the incomplete reference: g the learned second component with spike-and-slab prior and h t-SNE visualization of the learned latent features are shown. i t-SNE visualizations of cells in the mouse prefrontal cortex dataset, using the latent features obtained from TF-IDF + PCA and j the latent features obtained from RA3 with pseudo-bulk reference constructed from the mouse forebrain dataset. k t-SNE visualizations of cells in the 10X PBMC dataset using the latent features obtained from RA3 with pseudo-bulk reference constructed from another PBMC dataset. Chromatin accessibility of S100A12 (a marker gene of monocytes) and MS4A1 (a marker gene of B cells) is projected onto the visualizations, respectively. TF-IDF term frequency-inverse document frequency transformation, PCA principal component analysis.
Fig. 3
Fig. 3. Evaluation of the visualization of scCAS data.
a The dataset of CLP/LMPP/MPP cells. b The dataset of donor BM0828. c The human bone marrow dataset. d The mouse forebrain dataset (half). e The mouse forebrain dataset (half) with 25% dropout rate. f The dataset of mouse prefrontal cortex. For all the datasets, we obtained the latent features from SCALE, Scasat, cisTopic, Cusanovich2018, SnapATAC, and RA3, and then implemented t-SNE for visualization.
Fig. 4
Fig. 4. Assessment of the clustering results.
We implemented Louvain clustering on the low-dimensional representation provided by each method to get the cluster assignments. The cluster assignments for scABC were obtained directly from the model output. a The clustering performance using different methods evaluated by adjusted mutual information (AMI). The measure of center for the error bars denotes the AMI for different methods. The error bar denotes the estimated standard error in ten bootstrap samples. b The clustering performance using different methods on the mouse forebrain dataset (half) at different dropout rates evaluated by AMI. c The clustering performance using different methods on the 10X PBMC dataset evaluated by Residual Average Gini Index (RAGI) score.
Fig. 5
Fig. 5. Trajectory inference and motif enrichment analysis.
a t-SNE visualization of the cells from donor BM0828 and the inferred trajectory with Slingshot using the output of RA3 and Louvain clustering. The hematopoietic differentiation tree is shown on the bottomleft. b The top 50 most variable TF binding motifs within the cluster-specific peaks for the cells of donor BM0828. The deviations calculated by chromVAR are shown. FACS fluorescent activated cell sorting.

Similar articles

Cited by

References

    1. Klemm SL, Shipony Z, Greenleaf WJ. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 2019;20:207–220. doi: 10.1038/s41576-018-0089-8. - DOI - PubMed
    1. Tsompana M, Buck MJ. Chromatin accessibility: a window into the genome. Epigenetics Chromatin. 2014;7:33. doi: 10.1186/1756-8935-7-33. - DOI - PMC - PubMed
    1. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, dna-binding proteins and nucleosome position. Nat. Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. - DOI - PMC - PubMed
    1. Buenrostro JD, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523:486–490. doi: 10.1038/nature14590. - DOI - PMC - PubMed
    1. Cusanovich DA, et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348:910–914. doi: 10.1126/science.aab1601. - DOI - PMC - PubMed

Publication types

LinkOut - more resources