Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 3;21(1):22.
doi: 10.1186/s13059-020-1929-3.

From reads to insight: a hitchhiker's guide to ATAC-seq data analysis

Affiliations
Review

From reads to insight: a hitchhiker's guide to ATAC-seq data analysis

Feng Yan et al. Genome Biol. .

Abstract

Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) is widely used in studying chromatin biology, but a comprehensive review of the analysis tools has not been completed yet. Here, we discuss the major steps in ATAC-seq data analysis, including pre-analysis (quality check and alignment), core analysis (peak calling), and advanced analysis (peak differential analysis and annotation, motif enrichment, footprinting, and nucleosome position analysis). We also review the reconstruction of transcriptional regulatory networks with multiomics data and highlight the current challenges of each step. Finally, we describe the potential of single-cell ATAC-seq and highlight the necessity of developing ATAC-seq specific analysis tools to obtain biologically meaningful insights.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of ATAC-seq datasets increase and sample output for pre-analysis and advanced analysis. a The number of ATAC-seq datasets, ATAC-seq publications, DNase-seq datasets, FAIRE-seq datasets, and MNase-seq datasets in PubMed from 1 Jan 2013 to 1 Oct 2019. b Typical fragment size distribution plot shows enrichment around 100 and 200 bp, indicating nucleosome-free and mono-nucleosome-bound fragments. c Typical TSS enrichment plot shows that nucleosome-free fragments are enriched at TSS, while mono-nucleosome fragments are depleted at TSS but enriched at flanking regions. d Typical peak annotation pie chart shows that more than half of the peaks fall into enhancer regions (distal intergenic and intronic regions), and only around 25% of the peaks are in promoter regions. TSS: transcription start site
Fig. 2
Fig. 2
Roadmap of a typical ATAC-seq analysis. Four major steps are listed, including pre-analysis, core analysis, advanced analysis, and integration with multiomics data. Pre-analyses include pre-alignment QC, alignment and post-alignment processing, and QC. Core analysis includes peak calling. Advanced analyses include peak, motif, footprint, and nucleosome analysis. Multiomics data integration includes integration with ChIP-seq and RNA-seq data and regulatory network reconstruction. Text in each box emphasizes the important considerations in each analysis step. We suggest researchers start with FastQC, trimmomatic, and BWA-MEM for pre-analysis, MACS2 for peak calling, csaw for peak differential analysis, ChIPseeker for annotation and visualization, MEME suite for motif detection and enrichment, HMMRATAC for nucleosome detection, HINT-ATAC for footprint analysis, and PCEA for regulatory network reconstruction with RNA-seq. QC: quality check; TSS: transcription start site; TF: transcription factor; DEG: differentially expressed gene
Fig. 3
Fig. 3
Schematic and real ATAC-seq data from core and advanced analyses. a In an ATAC-seq experiment, Tn5 binds and cuts open chromatin and simultaneously ligates adapters. The fragments are sequenced to identify open chromatin regions (black) and footprints (blue). NFR fragments represent the open chromatin, while nucleosome-bound fragments reflect nucleosome positions (gray shaded tracks). b Real ATAC-seq data. Signal tracks are generated from BAM file (Raw) and bias corrected by HINT-ATAC (Bias corrected). Peak sets are generated from three types of peak callers, count-based (red), shape-based (blue), and HMM based (black). For MACS2, two strategies (paired-end and shift-extend) are used. For HMMRATAC, the extended ranges at both sides indicate the nucleosomes. The HINT-ATAC track is footprints detected by HINT-ATAC, while the RUNX1 motif track is the footprints matching RUNX1 motif from JASPAR database. The K562 ChIP-seq track is the RUNX1 ChIP-seq from ENCODE, indicating the footprint detection can recapitulate the real TF binding. The right box illustrates the shift-extend approach. First, it shifts both ends s-bp towards outside, and then extend 2s-bp towards inside. c Illustration of network reconstruction by ATAC-seq data. The presence of TF can be represented by motifs or footprints detected by aforementioned methods. NFR: nucleosome-free region; TF: transcription factor; HMM: hidden Markov model
Fig. 4
Fig. 4
Summary of peak calling and peak differential analysis tools. a Peak callers can be divided into count-based, shape-based, and Markov model approaches. They can be further divided by the statistical methods or models used. b Peak differential analysis tools can be divided into peak set-based and sliding window approaches. Peak set-based methods are divided based on the usage of external peak caller and RNA-seq DE packages. Sliding window methods are divided based on statistical methods or models used. ZINB: zero-inflated negative binomial; HMM: hidden Markov model; DE: differential expression; NB: negative binomial

References

    1. Kornberg RD. Chromatin structure: a repeating unit of histones and DNA. Science. 1974;184:868–871. doi: 10.1126/science.184.4139.868. - DOI - PubMed
    1. Richmond TJ, Davey CA. The structure of DNA in the nucleosome core. Nature. 2003;423:145–150. doi: 10.1038/nature01595. - DOI - PubMed
    1. Human Genome Sequencing Consortium I Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. - DOI - PubMed
    1. Grewal SIS, Moazed D. Heterochromatin and epigenetic control of gene expression. Science. 2003;301:798–802. doi: 10.1126/science.1086887. - DOI - PubMed
    1. Weiler KS, Wakimoto BT. Heterochromatin and gene expression in Drosophila. Annu Rev Genet. 1995;29:577–605. doi: 10.1146/annurev.ge.29.120195.003045. - DOI - PubMed

Publication types

LinkOut - more resources