Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 15;38(22):5126-5128.
doi: 10.1093/bioinformatics/btac644.

SCAFE: a software suite for analysis of transcribed cis-regulatory elements in single cells

Affiliations

SCAFE: a software suite for analysis of transcribed cis-regulatory elements in single cells

Jonathan Moody et al. Bioinformatics. .

Abstract

Motivation: Cell type-specific activities of cis-regulatory elements (CRE) are central to understanding gene regulation and disease predisposition. Single-cell RNA 5'end sequencing (sc-end5-seq) captures the transcription start sites (TSS) which can be used as a proxy to measure the activity of transcribed CREs (tCREs). However, a substantial fraction of TSS identified from sc-end5-seq data may not be genuine due to various artifacts, hindering the use of sc-end5-seq for de novo discovery of tCREs.

Results: We developed SCAFE-Single-Cell Analysis of Five-prime Ends-a software suite that processes sc-end5-seq data to de novo identify TSS clusters based on multiple logistic regression. It annotates tCREs based on the identified TSS clusters and generates a tCRE-by-cell count matrix for downstream analyses. The software suite consists of a set of flexible tools that could either be run independently or as pre-configured workflows.

Availability and implementation: SCAFE is implemented in Perl and R. The source code and documentation are freely available for download under the MIT License from https://github.com/chung-lab/SCAFE. Docker images are available from https://hub.docker.com/r/cchon/scafe. The submitted software version and test data are archived at https://doi.org/10.5281/zenodo.7023163 and https://doi.org/10.5281/zenodo.7024060, respectively.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
De novo identification of genuine TSS. (a) Distribution of TSS clusters properties (left) and their classification performances measured as AUC (right). (b) Distribution of probability (TSS classifier) (left) and its classification performance measured as AUC (right). (c) Performance of various metrics as a TSS classifier in (a) and (b) across various sequencing depths. (d) Histone marks at TSS clusters with a probability below (left) or above (right) 0.5 cutoff, at annotated gene TSS, exonic or intronic regions in sense or antisense orientations, or otherwise intergenic regions. n, number of TSS clusters; %, percentage of TSS clusters in all genomic locations regardless of probability thresholds

Similar articles

Cited by

References

    1. Adiconis X. et al. (2018) Comprehensive comparative analysis of 5’-end RNA-sequencing methods. Nat. Methods, 15, 505–511. - PMC - PubMed
    1. Affymetrix ENCODE Transcriptome Project and Cold Spring Harbor Laboratory ENCODE Transcriptome Project (2009) Post-transcriptional processing generates a diversity of 5’-modified long and short RNAs. Nature, 457, 1028–1032. - PMC - PubMed
    1. Andersson R. et al. (2014) An atlas of active enhancers across human cell types and tissues. Nature, 507, 455–461. - PMC - PubMed
    1. Buenrostro J.D. et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523, 486–490. - PMC - PubMed
    1. Chang H.-C. et al. (2019) Investigating the role of super-enhancer RNAs underlying embryonic stem cell differentiation. BMC Genomics, 20, 896. - PMC - PubMed

Publication types