Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 27;15(1):5278.
doi: 10.1038/s41467-024-49523-3.

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Affiliations

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Sílvia Carbonell-Sala et al. Nat Commun. .

Abstract

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Full-length transcript annotation using CapTrap-seq and other library preparation methods.
A CapTrap-seq experimental workflow. Gray boxes highlight the four main steps of full-length (FL) cDNA library construction: Anchored dT Poly(A)+, CAP-trapping, CAP and Poly(A) dependent linker ligation, and FL-cDNA library enrichment as described in the text. B Two adult human complex transcriptomic samples, brain and heart, were used to perform the cross-protocol and cross-platform comparisons to assess the quality of CapTrap-seq. The horizontal green line indicates the cross-protocol comparisons, including four different sequencing library preparation methods: CapTrap-seq, directRNA®, TeloPrime®, and SMARTer®. Whereas, the vertical blue line shows cross-platform comparison using CapTrap-seq in combination with three long-read sequencing platforms: ONT, PacBio Sequel I, and Sequel II. C Read aggregate deepTools2 profiles along the body of annotated GENCODE genes. The shaded regions indicate the 95% confidence interval. D Length distribution of mapped long-read ONT reads for each protocol. The total number of reads (N), median read length (beige vertical line), and the mapping rate are shown in the top right corner. E Detection of full-length reads among all, spliced and unspliced reads, with 5′ and 3′ termini inferred from robust (FANTOM5 phase 1 and 2 robust (n = 201,802)) CAGE clusters and poly(A) tails. Colors highlights four different categories of long-read (LR) completeness: Gray: unsupported LRs; Sky blue: 3’ supported LRs; Light pink: 5’ supported LRs; Purple: 5’ + 3’ supported LRs. The blue percentage displayed at the top of each bar indicates the ratio of a specific read type (spliced, unspliced) to the total number of reads.
Fig. 2
Fig. 2. Identification of complete structures of annotated protein-coding and non-protein-coding transcripts in the human genome.
A, B Transcripts were identified for the PRCC protein-coding gene (A) and for the MEG3 lncRNA gene (B) in human brain samples. Colors denote the library preparation method: orange for SMARTer, pink for TeloPrime, and blue for CapTrap. The GENCODE models (v44) are shown in navy and green for protein-coding and lncRNA genes/transcripts, respectively. The bigwig files derived from the corresponding long-read RNA-seq data were shown below each transcript. All bigwig files are shown using signal tracks displayed in the “full” mode in the UCSC genome browser.
Fig. 3
Fig. 3. Full-length transcript annotation by CapTrap-seq using different long-read sequencing platforms.
A Length distribution for all mapped reads and B the proportion of reads with different types of termini support as described in Fig. 1; C CapTrap-seq transcripts for the HMGCL gene generated using ONT, the PacBioSI, and SII platforms. For details see Fig. 2.
Fig. 4
Fig. 4. Capping of SIRVs and ERCC controls in the human brain sample using ONT.
A Two-step enzymatic strategy for adding a cap structure at the 5’ ends of uncapped RNA spike-in controls; B Detection rate of SIRVs (yellow-green) and ERCC (navy) synthetic controls; C Correlation between input RNA concentration and raw read counts for ERCC spike-ins in the brain sample. Each point represents a synthetic ERCC control. The green line indicates a linear fit to the corresponding dataset; D Detection of SIRVs as a function of length. Three main detection levels have been distinguished: end-to-end (green), partial (red), and not detected/absent (gray). The black numbers displayed at the top indicate the total number of SIRVs for each detection level.
Fig. 5
Fig. 5. Benchmark of CapTrap-seq using LRGASP samples.
A Proportion of ERCC spike-ins in human and mouse cell lines detected by the different LRGASP protocols and platforms; B Correlation between input RNA concentration and raw read counts for ERCC spike-ins. See Fig. 4C for details. For clarity, the triple replicates for each sample have been combined.

Update of

References

    1. Zhao BS, Roundtree IA, He C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 2017;18:31–42. doi: 10.1038/nrm.2016.132. - DOI - PMC - PubMed
    1. Passmore LA, Coller J. Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression. Nat. Rev. Mol. Cell Biol. 2022;23:93–106. doi: 10.1038/s41580-021-00417-y. - DOI - PMC - PubMed
    1. Ramanathan A, Robb GB, Chan S-H. mRNA capping: biological functions and applications. Nucleic Acids Res. 2016;44:7511–7526. doi: 10.1093/nar/gkw551. - DOI - PMC - PubMed
    1. Herzel L, Ottoz DSM, Alpert T, Neugebauer KM. Splicing and transcription touch base: co-transcriptional spliceosome assembly and function. Nat. Rev. Mol. Cell Biol. 2017;18:637–650. doi: 10.1038/nrm.2017.63. - DOI - PMC - PubMed
    1. Lagarde J, et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 2017;49:1731–1740. doi: 10.1038/ng.3988. - DOI - PMC - PubMed