. 2022 Jul 7:9:900323.

doi: 10.3389/fmolb.2022.900323. eCollection 2022.

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Marina Naval-Sanchez¹, Nikita Deshpande¹, Minh Tran¹, Jingyu Zhang¹, Majid Alhomrani^{2

3}, Walaa Alsanie^{2

3}, Quan Nguyen¹, Christian M Nefzger¹

Affiliations

¹ Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD, Australia.
² Department of Clinical Laboratories Sciences, Faculty of Applied Medical Sciences, Taif University, Taif, Saudi Arabia.
³ Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif, Saudi Arabia.

PMID: 35874611
PMCID: PMC9302965
DOI: 10.3389/fmolb.2022.900323

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Marina Naval-Sanchez et al. Front Mol Biosci. 2022.

. 2022 Jul 7:9:900323.

doi: 10.3389/fmolb.2022.900323. eCollection 2022.

Authors

Marina Naval-Sanchez¹, Nikita Deshpande¹, Minh Tran¹, Jingyu Zhang¹, Majid Alhomrani^{2

3}, Walaa Alsanie^{2

3}, Quan Nguyen¹, Christian M Nefzger¹

Affiliations

¹ Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD, Australia.
² Department of Clinical Laboratories Sciences, Faculty of Applied Medical Sciences, Taif University, Taif, Saudi Arabia.
³ Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif, Saudi Arabia.

PMID: 35874611
PMCID: PMC9302965
DOI: 10.3389/fmolb.2022.900323

Abstract

Background: Chromatin falls into one of two major subtypes: closed heterochromatin and euchromatin which is accessible, transcriptionally active, and occupied by transcription factors (TFs). The most widely used approach to interrogate differences in the chromatin state landscape is the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). While library generation is relatively inexpensive, sequencing depth requirements can make this assay cost-prohibitive for some laboratories. Findings: Here, we benchmark data from Beijing Genomics Institute's (BGI) DNBSEQ-G400 low-cost sequencer against data from a standard Illumina instrument (HiSeqX10). For comparisons, the same bulk ATAC-seq libraries generated from pluripotent stem cells (PSCs) and fibroblasts were sequenced on both platforms. Both instruments generate sequencing reads with comparable mapping rates and genomic context. However, DNBSEQ-G400 data contained a significantly higher number of small, sub-nucleosomal reads (>30% increase) and a reduced number of bi-nucleosomal reads (>75% decrease), which resulted in narrower peak bases and improved peak calling, enabling the identification of 4% more differentially accessible regions between PSCs and fibroblasts. The ability to identify master TFs that underpin the PSC state relative to fibroblasts (via HOMER, HINT-ATAC, TOBIAS), namely, foot-printing capacity, were highly similar between data generated on both platforms. Integrative analysis with transcriptional data equally enabled direct recovery of three published 3-factor combinations that have been shown to induce pluripotency. Conclusion: Other than a small increase in peak calling sensitivity for DNBSEQ-G400 data (BGI), both platforms enable comparable levels of open chromatin identification for ATAC-seq library sequencing, yielding similar analytical outcomes, albeit at low-data generation costs in the case of the BGI instrument.

Keywords: ATAC-seq; BGI; DNBSEQ-G400; Illumina; benchmarking; foot-printing; motif enrichment; sequencing platform.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Genomic context and insert size distribution of sequenced reads. **(A)** Schematic overview of experimental setup (e.g., Omni-ATACseq library generation from three biological replicate samples for MEFs and mESCs each). **(B)** Enrichment of sequencing fragments at transcriptional start sites (TSS, ±1 Kb) for data from both platforms (n = 3, biological replicates). The color scale indicates the number of reads mapping to each TSS across the genome. **(C)** Representative plots depicting insert size distribution for data from the different sequencing platforms; a black line indicates the boundary between sub- and mono/poly-nucleosomal reads with relative % values indicated above. **(D)** Quantification of percentage of bi-nucleosomal reads in Illumina versus BGI data for MEFs and mESCs (n = 3, biological replicates, Student’s t-test, two-tailed, unpaired). **(E)** Quantification of the percentage of sub-nucleosomal reads in Illumina versus BGI data for MEFs and mESCs (n = 3, biological replicates, Student’s t-test, two-tailed, unpaired). **(F)** Peak calling was performed (MACS2), followed by visualization of peak numbers (averaged across the three biological replicates) with their associated p-values. **(G)** Genomic context of reads from each sample group (averaged across the three biological replicates). **(H)** Unsupervised clustering of all samples based on Euclidean distance on the number of counts within open-regions (n = 3 biological replicates) minimal distance = 0, mean distance = 315, and maximal distance = 540.

**FIGURE 2**
Comparison of peaks identified in data from both platforms. **(A)** Tracks from all samples at the locus of the Actb housekeeping gene. **(B)** Computational workflow to identify high-confidence peaks across the samples of one experimental group (e.g., for three biological MEF replicate samples sequenced on the Illumina platform). **(C)** Venn diagram for high-confidence peaks (across the biological replicates) identified in mESC data sequenced on Illumina and BGI platform. **(D,E)** Peak scores of Illumina and BGI specific peaks relative to the scores of all peaks in the data set and relative to peaks shared by data sets. **(F,G)** Tracks of representative Illumina and BGI specific peaks. **(H)** Genomic context of high-confidence mESC peaks (displayed for all peaks, peaks that are shared/overlap between both data sets, and high-confidence peaks only identified in the Illumina or BGI data set).

**FIGURE 3**
Motif enrichment analysis to recover TF that underpins the mESC identity. **(A,B)** Volcano plots for differentially accessible peaks between MEFs and mESCs were identified for data from both platforms (n = 3 biological replicates). **(C,D)** The top 25 motifs identified by HOMER with higher enrichment in mESC specific regions were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of linked TFs across both cell types. The top five motifs (re-ranked by expression differences) are indicated in red. Both data sets recovered the same five TF (indicated in red) containing three published 3-factor combinations previously demonstrated to enable pluripotency induction (Pou5f1, Sox2, Klf4; Pou5f1, Esrrb, Klf4, Nr5a2, Sox2, Klf4). **(E–L)** Aggregate TF footprint profiles for CTCF, Pou5f1, Sox2, and Esrrb for both Illumina and BGI ATACseq data sets (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot and is based on PMW scanned sites that intersect with a sample’s open chromatin regions.

**FIGURE 4**
Foot-printing analysis with Hint-ATAC pipeline to recover TF that underpins the mESC identity. **(A,B)** TF activity changes quantified by Hint-ATAC pipeline for mESC vs. MEFs, position of Pou5f1, Nanog, and Sox2 are indicated in red (analysis is based on the merged data of three biological replicates as HINT-ATAC does not accept replicate data). **(C,D)** The top 40 motifs identified by Hint-ATAC with higher activity in mESC were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of linked TFs across both cell types. The top three motifs (re-ranked by expression differences) are indicated in red. **(E–L)** Representative aggregate accessibility profiles of TFBS with evidence for occupancy in the cell types for CTCF, Pou5f1, Sox2, and Nanog (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot and is based on PMW scanned sites (with evidence for occupancy in one of the cell types) that intersect with a sample’s open chromatin regions.

**FIGURE 5**
Foot-printing analysis with TOBIAS pipeline to recover TF that underpins the mESC identity. **(A,B)** TF binding changes quantified by TOBIAS pipeline are visualized in the form of volcano plots (analysis is based on the merged data of three biological replicates as TOBIAS does not accept replicate data). **(C,D)** The top 40 motifs identified by TOBIAS with higher binding activity in mESC were integrated with transcriptional data for mESCs and MEFs (n = 3 biological replicates) to visualize expression differences of these TFs across both cell types. The top five to six motifs (re-ranked by expression differences) are indicated in red containing three published 3-factor combinations previously demonstrated to enable pluripotency induction (Pou5f1, Sox2, Klf4; Pou5f1, Esrrb, Klf4; Nr5a2, Sox2, and Klf4). **(E–L)** Representative aggregate accessibility profiles of TFBS with evidence for occupancy in one of the cell types for CTCF, Pou5f1, Sox2, and Nanog (profiles are based on the merged data of three biological replicates). The number of TFBS used to generate each aggregate is indicated in the bottom left corner of each plot that intersect with a sample’s open chromatin regions.

See this image and copyright information in PMC

Cited by

Genome Sequencing of the Antibiotic-Resistant Leucobacter sp. HNU-1 and Its Developmental Toxicity in Caenorhabditis elegans.
Ju J, Lu X, Gao Z, Yin H, Xu S, Li H. Ju J, et al. Int J Mol Sci. 2025 Apr 13;26(8):3673. doi: 10.3390/ijms26083673. Int J Mol Sci. 2025. PMID: 40338253 Free PMC article.
Comparison of the DNBSEQ platform and Illumina HiSeq 2000 for bacterial genome assembly.
Hu T, Chen J, Lin X, He W, Liang H, Wang M, Li W, Wu Z, Han M, Jin X, Kristiansen K, Xiao L, Zou Y. Hu T, et al. Sci Rep. 2024 Jan 14;14(1):1292. doi: 10.1038/s41598-024-51725-0. Sci Rep. 2024. PMID: 38221534 Free PMC article.
Structural variant and nucleosome occupancy dynamics postchemotherapy in a HER2+ breast cancer organoid model.
Starostecka M, Jeong H, Hasenfeld P, Benito-Garagorri E, Christiansen T, Stober Brasseur C, Gomes Queiroz M, Garcia Montero M, Jechlinger M, Korbel JO. Starostecka M, et al. Proc Natl Acad Sci U S A. 2025 Mar 4;122(9):e2415475122. doi: 10.1073/pnas.2415475122. Epub 2025 Feb 24. Proc Natl Acad Sci U S A. 2025. PMID: 39993200 Free PMC article.
High-throughput sequencing: a breakthrough in molecular diagnosis for precision medicine.
Dongare DB, Nishad SS, Mastoli SY, Saraf SA, Srivastava N, Dey A. Dongare DB, et al. Funct Integr Genomics. 2025 Jan 22;25(1):22. doi: 10.1007/s10142-025-01529-w. Funct Integr Genomics. 2025. PMID: 39838192 Review.

References

1. Alexandre P. A., Naval-Sánchez M., Menzies M., Nguyen L. T., Porto-Neto L. R., Fortes M. R. S., et al. (2021). Chromatin Accessibility and Regulatory Vocabulary across Indicine Cattle Tissues. Genome Biol. 22, 273. 10.1186/s13059-021-02489-7 PubMed Abstract | 10.1186/s13059-021-02489-7 | Google Scholar - DOI - DOI - PMC - PubMed
1. Amemiya H. M., Kundaje A., Boyle A. P. (2019). The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 9 (1), 1–5. 10.1038/s41598-019-45839-z PubMed Abstract | 10.1038/s41598-019-45839-z | Google Scholar - DOI - DOI - PMC - PubMed
1. Bentsen M., Goymann P., Schultheis H., Klee K., Petrova A., Wiegandt R., et al. (2020). ATAC-seq Footprinting Unravels Kinetics of Transcription Factor Binding during Zygotic Genome Activation. Nat. Commun. 11 (1), 4267. 10.1038/s41467-020-18035-1 PubMed Abstract | 10.1038/s41467-020-18035-1 | Google Scholar - DOI - DOI - PMC - PubMed
1. Buenrostro J. D., Giresi P. G., Zaba L. C., Chang H. Y., Greenleaf W. J. (2013). Transposition of Native Chromatin for Fast and Sensitive Epigenomic Profiling of Open Chromatin, DNA-Binding Proteins and Nucleosome Position. Nat. Methods 10 (12), 1213–1218. 10.1038/nmeth.2688 PubMed Abstract | 10.1038/nmeth.2688 | Google Scholar - DOI - DOI - PMC - PubMed
1. Chen J., Nefzger C. M., Rossello F. J., Sun Y. B. Y., Lim S. M., Liu X., et al. (2018). Fine Tuning of Canonical Wnt Stimulation Enhances Differentiation of Pluripotent Stem Cells Independent of β-Catenin-Mediated T-Cell Factor Signaling. Stem Cells Dayt. Ohio 36 (6), 822–833. 10.1002/stem.2794 PubMed Abstract | 10.1002/stem.2794 | Google Scholar - DOI - DOI - PubMed

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Affiliations

Benchmarking of ATAC Sequencing Data From BGI's Low-Cost DNBSEQ-G400 Instrument for Identification of Open and Occupied Chromatin Regions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases