Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 29;14(8):e0221760.
doi: 10.1371/journal.pone.0221760. eCollection 2019.

Population size estimation for quality control of ChIP-Seq datasets

Affiliations

Population size estimation for quality control of ChIP-Seq datasets

Semyon K Kolmykov et al. PLoS One. .

Abstract

Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The workflow of algorithm for determination of FPCM and FNCMs.
Fig 2
Fig 2
Empirical densities of (a) FPCM and (b) FNCM obtained for peak caller PICS.
Fig 3
Fig 3. Relationship between FNCM(PICS) observed and predicted by the random forest regression model.
Fig 4
Fig 4. Quality metrics values for some low-quality ChIP-Seq data from GTRD.
Fig 5
Fig 5
ROC curves for (a) whole dataset PEAKS038038 and (b) for PEAKS038038 without orphans.

References

    1. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012. September 6;489:57–74. 10.1038/nature11247 - DOI - PMC - PubMed
    1. Yevshin I, Sharipov R, Kolmykov S, Kondrakhin Y, Kolpakov F. GTRD: a database on gene transcription regulation-2019 update. Nucleic Acids Res. 2019. January;47(D1):D100–D105. 10.1093/nar/gky1128 - DOI - PMC - PubMed
    1. Oki S, Ohta T, Shioi G, Hatanaka H, Ogasawara O, Okuda Y, et al. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO reports. 2018. November 9;19(12):e46255 10.15252/embr.201846255 - DOI - PMC - PubMed
    1. Cheneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic Acids Res. 2018. January 4;46(D1):D267–D275. 10.1093/nar/gkx1092 - DOI - PMC - PubMed
    1. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22(9):1813–1831. 10.1101/gr.136184.111 - DOI - PMC - PubMed

Publication types

Substances