Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 17;26(1):282.
doi: 10.1186/s13059-025-03735-y.

A hierarchical, count-based model highlights challenges in scATAC-seq data analysis and points to opportunities to extract finer-resolution information

Affiliations

A hierarchical, count-based model highlights challenges in scATAC-seq data analysis and points to opportunities to extract finer-resolution information

Aaron Wing Cheung Kwok et al. Genome Biol. .

Abstract

Background: Data from Single-cell Assay for Transposase Accessible Chromatin with Sequencing (scATAC-seq) is highly sparse. While current computational methods feature a range of transformation procedures to extract meaningful information, major challenges remain.

Results: Here, we discuss the major scATAC-seq data analysis challenges such as sequencing depth normalization and region-specific biases. We present a hierarchical count model that is motivated by the data generating process of scATAC-seq data. Our simulations show that current scATAC-seq data, while clearly containing physical single-cell resolution, are too sparse to infer true informational-level single-cell, single-region of chromatin accessibility states.

Conclusions: While the broad utility of scATAC-seq at a cell type level is undeniable, describing it as fully resolving chromatin accessibility at single-cell resolution, particularly at individual locus level, may overstate the level of detail currently achievable. We conclude that chromatin accessibility profiling at true single-cell, single-region resolution is challenging with current data sensitivity, but that it may be achieved with promising developments in optimizing the efficiency of scATAC-seq assays.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: Davis McCarthy is an Editorial Board Member for Genome Biology but was not involved in the editorial process of this manuscript.

Figures

Fig. 1
Fig. 1
Conceptual diagram for key challenges in typical scATAC-seq data analysis, including fragment aggregation and quantification, between-cell normalization, between-feature normalization, and interpreting chromatin accessibility at single-cell resolution
Fig. 2
Fig. 2
a Raw counts and their TF-transformed values for a random region in PBMC10k scATAC-seq dataset, plotted against the total count of each cell. Each dot is a cell. Here the region chr1:1273633-1274133 was chosen for demonstration. b Variance of raw count and IDF-transformed values plotted against mean of raw count of each region. Each dot is a region. c Mean of non-zero counts in each cell plotted against the total count for both scRNA-seq and scATAC-seq data from the PBMC10k dataset
Fig. 3
Fig. 3
Fitted lowess curves of log(count+1) as a function of GC-content for: a 6 of the replicates (s: sequencing batch, d: donor) of CD8+ T cells and b 5 of the annotated cell types from donor s1d1 in the Luecken dataset. ce Mock null comparison between CD16+ Monocytes. Peaks are sorted into 10 bins according to their GC-content, and log-fold changes (LFC) between the mock groups are plotted against their respective bins. In a null setting, the LFC should be centered at zero. The blue curve represents a generalized additive model (GAM) fit
Fig. 4
Fig. 4
a Simulation data with different combinations of background rates and signal-to-noise ratio. πj is fixed to 0.3 for demonstration purposes. For each scenario, simulation is repeated for 30 times and the mean AUROC is calculated. b Mean AUROC against mean count of simulated counts. c Box plot of peak mean from 6 datasets with varying biology and assays. Red dotted line marks the point where mean count 0.1, corresponding to AUROC 0.55 in our simulations
Fig. 5
Fig. 5
Analysis with 10X multiome PBMC10k dataset. a–b The first 2 principal components (PCs) derived from model posterior and LSI respectively. Each dot is a cell, colored by cell type annotation derived from the transcriptome of the same cells. c Pearson correlation of the first 10 PCs with library size. d Mean silhouette widths for each cell type (n=8) derived from the first 30 PCs derived from LSI processed data, LSI with first PC removed, and model posterior

References

    1. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561):486–90. - PMC - PubMed
    1. Adey AC. Tagmentation-based single-cell genomics. Genome Res. 2021;31(10):1693–705. - PMC - PubMed
    1. Li Z, Kuppe C, Ziegler S, Cheng M, Kabgani N, Menzel S, et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat Commun. 2021;12(1):6386. - PMC - PubMed
    1. Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53(3):403–11. - PMC - PubMed
    1. Van den Berge K, Chou HJ, Roux de Bézieux H, Street K, Risso D, Ngai J, et al. Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects. Cell Rep Methods. 2022;2(11): 100321. - PMC - PubMed

MeSH terms

LinkOut - more resources