Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 7:9:900323.
doi: 10.3389/fmolb.2022.900323. eCollection 2022.

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Affiliations

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Marina Naval-Sanchez et al. Front Mol Biosci. .

Abstract

Background: Chromatin falls into one of two major subtypes: closed heterochromatin and euchromatin which is accessible, transcriptionally active, and occupied by transcription factors (TFs). The most widely used approach to interrogate differences in the chromatin state landscape is the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). While library generation is relatively inexpensive, sequencing depth requirements can make this assay cost-prohibitive for some laboratories. Findings: Here, we benchmark data from Beijing Genomics Institute's (BGI) DNBSEQ-G400 low-cost sequencer against data from a standard Illumina instrument (HiSeqX10). For comparisons, the same bulk ATAC-seq libraries generated from pluripotent stem cells (PSCs) and fibroblasts were sequenced on both platforms. Both instruments generate sequencing reads with comparable mapping rates and genomic context. However, DNBSEQ-G400 data contained a significantly higher number of small, sub-nucleosomal reads (>30% increase) and a reduced number of bi-nucleosomal reads (>75% decrease), which resulted in narrower peak bases and improved peak calling, enabling the identification of 4% more differentially accessible regions between PSCs and fibroblasts. The ability to identify master TFs that underpin the PSC state relative to fibroblasts (via HOMER, HINT-ATAC, TOBIAS), namely, foot-printing capacity, were highly similar between data generated on both platforms. Integrative analysis with transcriptional data equally enabled direct recovery of three published 3-factor combinations that have been shown to induce pluripotency. Conclusion: Other than a small increase in peak calling sensitivity for DNBSEQ-G400 data (BGI), both platforms enable comparable levels of open chromatin identification for ATAC-seq library sequencing, yielding similar analytical outcomes, albeit at low-data generation costs in the case of the BGI instrument.

Keywords: ATAC-seq; BGI; DNBSEQ-G400; Illumina; benchmarking; foot-printing; motif enrichment; sequencing platform.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Genomic context and insert size distribution of sequenced reads. (A) Schematic overview of experimental setup (e.g., Omni-ATACseq library generation from three biological replicate samples for MEFs and mESCs each). (B) Enrichment of sequencing fragments at transcriptional start sites (TSS, ±1 Kb) for data from both platforms (n = 3, biological replicates). The color scale indicates the number of reads mapping to each TSS across the genome. (C) Representative plots depicting insert size distribution for data from the different sequencing platforms; a black line indicates the boundary between sub- and mono/poly-nucleosomal reads with relative % values indicated above. (D) Quantification of percentage of bi-nucleosomal reads in Illumina versus BGI data for MEFs and mESCs (n = 3, biological replicates, Student’s t-test, two-tailed, unpaired). (E) Quantification of the percentage of sub-nucleosomal reads in Illumina versus BGI data for MEFs and mESCs (n = 3, biological replicates, Student’s t-test, two-tailed, unpaired). (F) Peak calling was performed (MACS2), followed by visualization of peak numbers (averaged across the three biological replicates) with their associated p-values. (G) Genomic context of reads from each sample group (averaged across the three biological replicates). (H) Unsupervised clustering of all samples based on Euclidean distance on the number of counts within open-regions (n = 3 biological replicates) minimal distance = 0, mean distance = 315, and maximal distance = 540.
FIGURE 2
FIGURE 2
Comparison of peaks identified in data from both platforms. (A) Tracks from all samples at the locus of the Actb housekeeping gene. (B) Computational workflow to identify high-confidence peaks across the samples of one experimental group (e.g., for three biological MEF replicate samples sequenced on the Illumina platform). (C) Venn diagram for high-confidence peaks (across the biological replicates) identified in mESC data sequenced on Illumina and BGI platform. (D,E) Peak scores of Illumina and BGI specific peaks relative to the scores of all peaks in the data set and relative to peaks shared by data sets. (F,G) Tracks of representative Illumina and BGI specific peaks. (H) Genomic context of high-confidence mESC peaks (displayed for all peaks, peaks that are shared/overlap between both data sets, and high-confidence peaks only identified in the Illumina or BGI data set).
FIGURE 3
FIGURE 3
Motif enrichment analysis to recover TF that underpins the mESC identity. (A,B) Volcano plots for differentially accessible peaks between MEFs and mESCs were identified for data from both platforms (n = 3 biological replicates). (C,D) The top 25 motifs identified by HOMER with higher enrichment in mESC specific regions were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of linked TFs across both cell types. The top five motifs (re-ranked by expression differences) are indicated in red. Both data sets recovered the same five TF (indicated in red) containing three published 3-factor combinations previously demonstrated to enable pluripotency induction (Pou5f1, Sox2, Klf4; Pou5f1, Esrrb, Klf4, Nr5a2, Sox2, Klf4). (E–L) Aggregate TF footprint profiles for CTCF, Pou5f1, Sox2, and Esrrb for both Illumina and BGI ATACseq data sets (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot and is based on PMW scanned sites that intersect with a sample’s open chromatin regions.
FIGURE 4
FIGURE 4
Foot-printing analysis with Hint-ATAC pipeline to recover TF that underpins the mESC identity. (A,B) TF activity changes quantified by Hint-ATAC pipeline for mESC vs. MEFs, position of Pou5f1, Nanog, and Sox2 are indicated in red (analysis is based on the merged data of three biological replicates as HINT-ATAC does not accept replicate data). (C,D) The top 40 motifs identified by Hint-ATAC with higher activity in mESC were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of linked TFs across both cell types. The top three motifs (re-ranked by expression differences) are indicated in red. (E–L) Representative aggregate accessibility profiles of TFBS with evidence for occupancy in the cell types for CTCF, Pou5f1, Sox2, and Nanog (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot and is based on PMW scanned sites (with evidence for occupancy in one of the cell types) that intersect with a sample’s open chromatin regions.
FIGURE 5
FIGURE 5
Foot-printing analysis with TOBIAS pipeline to recover TF that underpins the mESC identity. (A,B) TF binding changes quantified by TOBIAS pipeline are visualized in the form of volcano plots (analysis is based on the merged data of three biological replicates as TOBIAS does not accept replicate data). (C,D) The top 40 motifs identified by TOBIAS with higher binding activity in mESC were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of these TFs across both cell types. The top five to six motifs (re-ranked by expression differences) are indicated in red containing three published 3-factor combinations previously demonstrated to enable pluripotency induction (Pou5f1, Sox2, Klf4; Pou5f1, Esrrb, Klf4; Nr5a2, Sox2, and Klf4). (E–L) Representative aggregate accessibility profiles of TFBS with evidence for occupancy in one of the cell types for CTCF, Pou5f1, Sox2, and Nanog (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot that intersect with a sample’s open chromatin regions.

Similar articles

Cited by

References

    1. Alexandre P. A., Naval-Sánchez M., Menzies M., Nguyen L. T., Porto-Neto L. R., Fortes M. R. S., et al. (2021). Chromatin Accessibility and Regulatory Vocabulary across Indicine Cattle Tissues. Genome Biol. 22, 273. 10.1186/s13059-021-02489-7 PubMed Abstract | 10.1186/s13059-021-02489-7 | Google Scholar - DOI - DOI - PMC - PubMed
    1. Amemiya H. M., Kundaje A., Boyle A. P. (2019). The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 9 (1), 1–5. 10.1038/s41598-019-45839-z PubMed Abstract | 10.1038/s41598-019-45839-z | Google Scholar - DOI - DOI - PMC - PubMed
    1. Bentsen M., Goymann P., Schultheis H., Klee K., Petrova A., Wiegandt R., et al. (2020). ATAC-seq Footprinting Unravels Kinetics of Transcription Factor Binding during Zygotic Genome Activation. Nat. Commun. 11 (1), 4267. 10.1038/s41467-020-18035-1 PubMed Abstract | 10.1038/s41467-020-18035-1 | Google Scholar - DOI - DOI - PMC - PubMed
    1. Buenrostro J. D., Giresi P. G., Zaba L. C., Chang H. Y., Greenleaf W. J. (2013). Transposition of Native Chromatin for Fast and Sensitive Epigenomic Profiling of Open Chromatin, DNA-Binding Proteins and Nucleosome Position. Nat. Methods 10 (12), 1213–1218. 10.1038/nmeth.2688 PubMed Abstract | 10.1038/nmeth.2688 | Google Scholar - DOI - DOI - PMC - PubMed
    1. Chen J., Nefzger C. M., Rossello F. J., Sun Y. B. Y., Lim S. M., Liu X., et al. (2018). Fine Tuning of Canonical Wnt Stimulation Enhances Differentiation of Pluripotent Stem Cells Independent of β-Catenin-Mediated T-Cell Factor Signaling. Stem Cells Dayt. Ohio 36 (6), 822–833. 10.1002/stem.2794 PubMed Abstract | 10.1002/stem.2794 | Google Scholar - DOI - DOI - PubMed