. 2024 Jun 27;15(1):5278.

doi: 10.1038/s41467-024-49523-3.

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Sílvia Carbonell-Sala¹, Tamara Perteghella^{1

2}, Julien Lagarde^{1

3}, Hiromi Nishiyori⁴, Emilio Palumbo¹, Carme Arnan¹, Hazuki Takahashi⁴, Piero Carninci^{4

5}, Barbara Uszczynska-Ratajczak^{6

7}, Roderic Guigó^{8

9}

Affiliations

¹ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
² Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
³ Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005, Barcelona, Catalonia, Spain.
⁴ Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan.
⁵ Human Technopole, Milan, Italy.
⁶ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. barbara.uszczynska@gmail.com.
⁷ Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland. barbara.uszczynska@gmail.com.
⁸ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. roderic.guigo@crg.cat.
⁹ Universitat Pompeu Fabra, Barcelona, Catalonia, Spain. roderic.guigo@crg.cat.

PMID: 38937428
PMCID: PMC11211341
DOI: 10.1038/s41467-024-49523-3

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Sílvia Carbonell-Sala et al. Nat Commun. 2024.

. 2024 Jun 27;15(1):5278.

doi: 10.1038/s41467-024-49523-3.

Authors

Affiliations

¹ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
² Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
³ Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005, Barcelona, Catalonia, Spain.
⁴ Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan.
⁵ Human Technopole, Milan, Italy.
⁶ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. barbara.uszczynska@gmail.com.
⁷ Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland. barbara.uszczynska@gmail.com.
⁸ Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. roderic.guigo@crg.cat.
⁹ Universitat Pompeu Fabra, Barcelona, Catalonia, Spain. roderic.guigo@crg.cat.

PMID: 38937428
PMCID: PMC11211341
DOI: 10.1038/s41467-024-49523-3

Abstract

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Full-length transcript annotation using CapTrap-seq and other library preparation methods.**
A CapTrap-seq experimental workflow. Gray boxes highlight the four main steps of full-length (FL) cDNA library construction: Anchored dT Poly(A)+, CAP-trapping^–, CAP and Poly(A) dependent linker ligation, and FL-cDNA library enrichment as described in the text. B Two adult human complex transcriptomic samples, brain and heart, were used to perform the cross-protocol and cross-platform comparisons to assess the quality of CapTrap-seq. The horizontal green line indicates the cross-protocol comparisons, including four different sequencing library preparation methods: CapTrap-seq, directRNA®, TeloPrime®, and SMARTer®. Whereas, the vertical blue line shows cross-platform comparison using CapTrap-seq in combination with three long-read sequencing platforms: ONT, PacBio Sequel I, and Sequel II. C Read aggregate deepTools2 profiles along the body of annotated GENCODE genes. The shaded regions indicate the 95% confidence interval. D Length distribution of mapped long-read ONT reads for each protocol. The total number of reads (N), median read length (beige vertical line), and the mapping rate are shown in the top right corner. E Detection of full-length reads among all, spliced and unspliced reads, with 5′ and 3′ termini inferred from robust (FANTOM5 phase 1 and 2 robust (n = 201,802)) CAGE clusters and poly(A) tails. Colors highlights four different categories of long-read (LR) completeness: Gray: unsupported LRs; Sky blue: 3’ supported LRs; Light pink: 5’ supported LRs; Purple: 5’ + 3’ supported LRs. The blue percentage displayed at the top of each bar indicates the ratio of a specific read type (spliced, unspliced) to the total number of reads.

**Fig. 2. Identification of complete structures of annotated protein-coding and non-protein-coding transcripts in the human genome.**
A, B Transcripts were identified for the *PRCC* protein-coding gene (A) and for the *MEG3* lncRNA gene (B) in human brain samples. Colors denote the library preparation method: orange for SMARTer, pink for TeloPrime, and blue for CapTrap. The GENCODE models (v44) are shown in navy and green for protein-coding and lncRNA genes/transcripts, respectively. The bigwig files derived from the corresponding long-read RNA-seq data were shown below each transcript. All bigwig files are shown using signal tracks displayed in the “full” mode in the UCSC genome browser.

**Fig. 3. Full-length transcript annotation by CapTrap-seq using different long-read sequencing platforms.**
A Length distribution for all mapped reads and B the proportion of reads with different types of termini support as described in Fig. 1; C CapTrap-seq transcripts for the *HMGCL* gene generated using ONT, the PacBioSI, and SII platforms. For details see Fig. 2.

**Fig. 4. Capping of SIRVs and ERCC controls in the human brain sample using ONT.**
A Two-step enzymatic strategy for adding a cap structure at the 5’ ends of uncapped RNA spike-in controls; B Detection rate of SIRVs (yellow-green) and ERCC (navy) synthetic controls; C Correlation between input RNA concentration and raw read counts for ERCC spike-ins in the brain sample. Each point represents a synthetic ERCC control. The green line indicates a linear fit to the corresponding dataset; D Detection of SIRVs as a function of length. Three main detection levels have been distinguished: end-to-end (green), partial (red), and not detected/absent (gray). The black numbers displayed at the top indicate the total number of SIRVs for each detection level.

**Fig. 5. Benchmark of CapTrap-seq using LRGASP samples.**
A Proportion of ERCC spike-ins in human and mouse cell lines detected by the different LRGASP protocols and platforms; B Correlation between input RNA concentration and raw read counts for ERCC spike-ins. See Fig. 4C for details. For clarity, the triple replicates for each sample have been combined.

See this image and copyright information in PMC

Update of

CapTrap-Seq: A platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing.
Carbonell-Sala S, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci P, Uszczynska-Ratajczak B, Guigó R. Carbonell-Sala S, et al. bioRxiv [Preprint]. 2023 Jun 18:2023.06.16.543444. doi: 10.1101/2023.06.16.543444. bioRxiv. 2023. Update in: Nat Commun. 2024 Jun 27;15(1):5278. doi: 10.1038/s41467-024-49523-3. PMID: 37398314 Free PMC article. Updated. Preprint.

References

1. Zhao BS, Roundtree IA, He C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 2017;18:31–42. doi: 10.1038/nrm.2016.132. - DOI - PMC - PubMed
1. Passmore LA, Coller J. Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression. Nat. Rev. Mol. Cell Biol. 2022;23:93–106. doi: 10.1038/s41580-021-00417-y. - DOI - PMC - PubMed
1. Ramanathan A, Robb GB, Chan S-H. mRNA capping: biological functions and applications. Nucleic Acids Res. 2016;44:7511–7526. doi: 10.1093/nar/gkw551. - DOI - PMC - PubMed
1. Herzel L, Ottoz DSM, Alpert T, Neugebauer KM. Splicing and transcription touch base: co-transcriptional spliceosome assembly and function. Nat. Rev. Mol. Cell Biol. 2017;18:637–650. doi: 10.1038/nrm.2017.63. - DOI - PMC - PubMed
1. Lagarde J, et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 2017;49:1731–1740. doi: 10.1038/ng.3988. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

U24 HG007234/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Affiliations

CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous