exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Affiliations

¹ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
² Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
³ Bioinformatics Research Laboratory, Molecular and Human Genetics Department, Baylor College of Medicine, Houston, TX, USA.
⁴ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA. Electronic address: mark@gersteinlab.org.

PMID: 30956140
PMCID: PMC7079576
DOI: 10.1016/j.cels.2019.03.004

exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Joel Rozowsky et al. Cell Syst. 2019.

. 2019 Apr 24;8(4):352-357.e3.

doi: 10.1016/j.cels.2019.03.004. Epub 2019 Apr 4.

Affiliations

¹ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
² Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
³ Bioinformatics Research Laboratory, Molecular and Human Genetics Department, Baylor College of Medicine, Houston, TX, USA.
⁴ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA. Electronic address: mark@gersteinlab.org.

PMID: 30956140
PMCID: PMC7079576
DOI: 10.1016/j.cels.2019.03.004

Abstract

Small RNA sequencing has been widely adopted to study the diversity of extracellular RNAs (exRNAs) in biofluids; however, the analysis of exRNA samples can be challenging: they are vulnerable to contamination and artifacts from different isolation techniques, present in lower concentrations than cellular RNA, and occasionally of exogenous origin. To address these challenges, we present exceRpt, the exRNA-processing toolkit of the NIH Extracellular RNA Communication Consortium (ERCC). exceRpt is structured as a cascade of filters and quantifications prioritized based on one's confidence in a given set of annotated RNAs. It generates quality control reports and abundance estimates for RNA biotypes. It is also capable of characterizing mappings to exogenous genomes, which, in turn, can be used to generate phylogenetic trees. exceRpt has been used to uniformly process all ∼3,500 exRNA-seq datasets in the public exRNA Atlas and is available from genboree.org and github.gersteinlab.org/exceRpt.

Keywords: RNA sequencing; RNA-seq; bioinformatics; bioinformatics tool; exRNAs; extracellular RNA; genomics; pipeline; transcriptome.

PubMed Disclaimer

Figures

**Figure 1**
**(A)**: exceRpt schema: Samples in FASTA, FASTQ or SRA file formats are used as inputs to exceRpt. Adapter and random barcode sequences are removed, followed by a read-quality filter, optional spike-in quantification and removal, and UniVec contaminant removal. High-quality filtered reads then enter the endogenous quantification engine, with RNA library prioritization defined by the user. After a second-pass endogenous genome and repetitive elements filter, reads are mapped to the exogenous miRNA, rRNA, and genomic libraries. **(B)**: Leave-one-out analysis: Running the pipeline multiple times with individual steps removed shows the effect of those steps on subsequent alignments. The sample used for this analysis was SRR822433, a plasma exRNA plasma sample. Low-quality and low-complexity reads and reads that align to UniVec or rRNA sequences account for a sizeable fraction of the total number sequenced. Removing the UniVec alignment step significantly increases the number of reads that, likely incorrectly, map to the exogenous genomes.

**Figure 2**
**(A)**: Read distributions: exceRpt outputs endogenous alignment quantifications which can be used to compare RNA biotype distributions in exRNA samples. Here, saliva has a higher proportion of exogenous sequences than other samples, and urine has a higher proportion of tRNA sequences. Quantifications can also be performed for cellular datasets, such as ENCODE samples, where the majority of reads align to long coding and non-coding RNAs in GENCODE. **(B+C)**: Exogenous alignment phylogeny with genome reads: Exogenous sequence quantifications based on exogenous genome reads and rRNA reads can be represented using phylogenetic trees. The tree in (B) was constructed using 1.74M genome reads from a saliva exRNA-seq sample, and the tree in (C) was constructed from 1,127K ribosomal reads in the same saliva sample. Saliva biofluids are distinguished from other biofluids by their exposure to a robust and complex bacterial community in the oral cavity (Hasan et al., 2014), which causes a greater contribution of reads of bacterial origin (and not human genome) to the sample. In both the phylogenetic trees constructed using either bacterial genome mapped reads (B) or ribosomal mapped reads(C), we find an abundance of reads assigned to the node corresponding to the genus *Streptococcus*. **(D)**: Quality control metrics: ERCC QC metrics are based on number of transcriptome reads and ratio of RNA-annotated reads to the genome reads. The horizontal and vertical lines define QC threshold minima. Most exRNA Atlas samples meet the standards and fall in the upper right quadrant.

See this image and copyright information in PMC

Comment in

Mapping Extracellular RNA Sheds Lights on Distinct Carriers.
Lässer C. Lässer C. Cell. 2019 Apr 4;177(2):228-230. doi: 10.1016/j.cell.2019.03.027. Cell. 2019. PMID: 30951666

References

1. Akat KM, Moore-McGriff D, Morozov P, Brown M, Gogakos T, Correa Da Rosa J, Mihailovic A, Sauer M, Ji R, Ramarathnam A, et al. (2014). Comparative RNA-sequencing analysis of myocardial and circulating small RNAs in human heart failure and their utility as biomarkers. Proc Natl Acad Sci U S A 111, 11151–11156. - PMC - PubMed
1. Barturen G, Rueda A, Hamberg M, Alganza A, Lebron R, Kotsyfakis M, et al. (2014). sRNAbench: profiling of small RNAs and its sequence variants in single or multi-species high-throughput experiments. Methods in Next Generation Sequencing: Methods in Next Generation Sequencing; 2014.
1. Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, and Craig DW (2016). Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 17, 257–271. - PMC - PubMed
1. Chan PP, and Lowe TM (2009). GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37, D93–97. - PMC - PubMed
1. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, and Tiedje JM (2014). Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42, D633–642. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Affiliations

exceRpt: A Comprehensive Analytic Platform for Extracellular RNA Profiling

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources