Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 15;22(1):155.
doi: 10.1186/s13059-021-02349-4.

PEPPRO: quality control and processing of nascent RNA profiling data

Affiliations

PEPPRO: quality control and processing of nascent RNA profiling data

Jason P Smith et al. Genome Biol. .

Abstract

Nascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
PEPPRO steps for genomic run-on data. PEPPRO starts from raw sequencing reads and produces a variety of quality control plots and processed output files for more detailed downstream analysis
Fig. 2
Fig. 2
PEPPRO test set data table and signal tracks. a Table showing the attributes of samples collected for our test set. Complete metadata is available from the PEPPRO website. b Read count normalized signal tracks from published data are visualized within a browser (Scale is per 1M)
Fig. 3
Fig. 3
RNA integrity is assessed with degradation ratios and insert sizes. a Schematic illustrating intact versus degraded libraries. b Degradation ratio for test samples (HelaS3 GRO sample could not be calculated; Values less than dashed line (1.0) are considered high quality). cf Insert size distributions for: c, a degraded single-end library; d, a degraded paired-end library; e, a non-degraded single-end library; and f, a non-degraded paired-end library (orange shading represents highly degraded reads; yellow shading represents partially degraded reads)
Fig. 4
Fig. 4
Library complexity is measured with unique read frequency distributions and projections. a Schematic demonstrating PCR duplication and library complexity (dashed line represents completely unique library). b Library complexity traces plot the read count versus externally calculated deduplicated read counts. Deduplication is a prerequisite, so these plots may only be produced for samples with UMIs. Inset zooms to region from 0 to double the maximum number of unique reads. c The position of curves in panel b at a sequencing depth of 10 million reads (dashed line represents minimum recommended percentage of unique reads)
Fig. 5
Fig. 5
Nascent RNA purity is assessed with the exon-intron ratio. a Schematic demonstrating mRNA contamination calculation. redX represents the exclusion of the first exon in the calculation. b Median mRNA contamination metric for test set samples (shaded region represents recommended range (1-1.8)). c Histogram showing the distribution of mRNA contamination score across genes in the K562 PRO-seq sample. d As in panel c for a GRO-seq library. e mRNA contamination distribution for K562 PRO-seq spiked with 30% K562 RNA-seq. f mRNA contamination distribution for HelaS3 GRO-seq is comparable to the 30% RNA-seq spike-in sample
Fig. 6
Fig. 6
Run-on efficiency is measured with pause indices. a Schematic demonstrating pause index calculation. b Pause index values for Drosophila melanogaster GRO-seq libraries with (GSM577247) or without sarkosyl (GSM577248). c The histogram of pause index values is shifted to the right upon addition of sarkosyl in GRO-seq libraries. d Pause index values for test set samples (Values above the dashed line are recommended). e High pause index identified in H9 treated PRO-seq. f Low pause index from HelaS3 GRO-seq
Fig. 7
Fig. 7
Fraction of reads in genomic features. a K562 PRO-seq represents a “good” cumulative fraction of reads in features (cFRiF) and fraction of reads in features (FRiF) plot. b K562 PRO-seq with 90% K562 RNA-seq spike-in represents a “bad” FRiF/PRiF
Fig. 8
Fig. 8
Differential analysis with the PEPPRO counts matrix. a MA plot between H9 DMSO versus H9 200nM romidepsin treated PRO-seq libaries (dots = genes; top 10 most significant genes labeled; n=3/treatment). b Most significantly differential gene count differences. c Read count normalized signal tracks from the differential analysis (Scale is per 1M)
Fig. 9
Fig. 9
Recommendation table. Based on our experience processing both high- and low-quality nascent RNA libraries, these are our recommended values for high-quality PRO-seq libraries

References

    1. Wang Z, Chu T, Choate LA, Danko CG. Identification of regulatory elements from nascent transcription using dreg. Genome Res. 2019;29:293–303. doi: 10.1101/gr.238279.118. - DOI - PMC - PubMed
    1. Scruggs B, Gilchrist D, Nechaev S, Muse G, Burkholder A, Fargo D, Adelman K. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Molecular Cell. 2015;58(6):1101–12. doi: 10.1016/j.molcel.2015.04.006. - DOI - PMC - PubMed
    1. Core LJ, Waterfall JJ, Lis JT. Nascent rna sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322(5909):1845–8. doi: 10.1126/science.1162228. - DOI - PMC - PubMed
    1. Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of rna polymerase reveal how promoters direct initiation and pausing. Science. 2013;339(6122):950–3. doi: 10.1126/science.1229386. - DOI - PMC - PubMed
    1. Chu T, Rice EJ, Booth GT, Salamanca HH, Wang Z, Core LJ, Longo SL, Corona RJ, Chin LS, Lis JT, Kwak H, Danko CG. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat Genet. 2018;50(11):1553–64. doi: 10.1038/s41588-018-0244-3. - DOI - PMC - PubMed

Publication types