Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 24;8(4):352-357.e3.
doi: 10.1016/j.cels.2019.03.004. Epub 2019 Apr 4.

exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Affiliations

exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Joel Rozowsky et al. Cell Syst. .

Abstract

Small RNA sequencing has been widely adopted to study the diversity of extracellular RNAs (exRNAs) in biofluids; however, the analysis of exRNA samples can be challenging: they are vulnerable to contamination and artifacts from different isolation techniques, present in lower concentrations than cellular RNA, and occasionally of exogenous origin. To address these challenges, we present exceRpt, the exRNA-processing toolkit of the NIH Extracellular RNA Communication Consortium (ERCC). exceRpt is structured as a cascade of filters and quantifications prioritized based on one's confidence in a given set of annotated RNAs. It generates quality control reports and abundance estimates for RNA biotypes. It is also capable of characterizing mappings to exogenous genomes, which, in turn, can be used to generate phylogenetic trees. exceRpt has been used to uniformly process all ∼3,500 exRNA-seq datasets in the public exRNA Atlas and is available from genboree.org and github.gersteinlab.org/exceRpt.

Keywords: RNA sequencing; RNA-seq; bioinformatics; bioinformatics tool; exRNAs; extracellular RNA; genomics; pipeline; transcriptome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A): exceRpt schema: Samples in FASTA, FASTQ or SRA file formats are used as inputs to exceRpt. Adapter and random barcode sequences are removed, followed by a read-quality filter, optional spike-in quantification and removal, and UniVec contaminant removal. High-quality filtered reads then enter the endogenous quantification engine, with RNA library prioritization defined by the user. After a second-pass endogenous genome and repetitive elements filter, reads are mapped to the exogenous miRNA, rRNA, and genomic libraries. (B): Leave-one-out analysis: Running the pipeline multiple times with individual steps removed shows the effect of those steps on subsequent alignments. The sample used for this analysis was SRR822433, a plasma exRNA plasma sample. Low-quality and low-complexity reads and reads that align to UniVec or rRNA sequences account for a sizeable fraction of the total number sequenced. Removing the UniVec alignment step significantly increases the number of reads that, likely incorrectly, map to the exogenous genomes.
Figure 2
Figure 2
(A): Read distributions: exceRpt outputs endogenous alignment quantifications which can be used to compare RNA biotype distributions in exRNA samples. Here, saliva has a higher proportion of exogenous sequences than other samples, and urine has a higher proportion of tRNA sequences. Quantifications can also be performed for cellular datasets, such as ENCODE samples, where the majority of reads align to long coding and non-coding RNAs in GENCODE. (B+C): Exogenous alignment phylogeny with genome reads: Exogenous sequence quantifications based on exogenous genome reads and rRNA reads can be represented using phylogenetic trees. The tree in (B) was constructed using 1.74M genome reads from a saliva exRNA-seq sample, and the tree in (C) was constructed from 1,127K ribosomal reads in the same saliva sample. Saliva biofluids are distinguished from other biofluids by their exposure to a robust and complex bacterial community in the oral cavity (Hasan et al., 2014), which causes a greater contribution of reads of bacterial origin (and not human genome) to the sample. In both the phylogenetic trees constructed using either bacterial genome mapped reads (B) or ribosomal mapped reads(C), we find an abundance of reads assigned to the node corresponding to the genus Streptococcus. (D): Quality control metrics: ERCC QC metrics are based on number of transcriptome reads and ratio of RNA-annotated reads to the genome reads. The horizontal and vertical lines define QC threshold minima. Most exRNA Atlas samples meet the standards and fall in the upper right quadrant.

Comment in

References

    1. Akat KM, Moore-McGriff D, Morozov P, Brown M, Gogakos T, Correa Da Rosa J, Mihailovic A, Sauer M, Ji R, Ramarathnam A, et al. (2014). Comparative RNA-sequencing analysis of myocardial and circulating small RNAs in human heart failure and their utility as biomarkers. Proc Natl Acad Sci U S A 111, 11151–11156. - PMC - PubMed
    1. Barturen G, Rueda A, Hamberg M, Alganza A, Lebron R, Kotsyfakis M, et al. (2014). sRNAbench: profiling of small RNAs and its sequence variants in single or multi-species high-throughput experiments. Methods in Next Generation Sequencing: Methods in Next Generation Sequencing; 2014.
    1. Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, and Craig DW (2016). Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 17, 257–271. - PMC - PubMed
    1. Chan PP, and Lowe TM (2009). GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37, D93–97. - PMC - PubMed
    1. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, and Tiedje JM (2014). Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42, D633–642. - PMC - PubMed

Publication types

Substances

LinkOut - more resources