Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 23;6(1):974.
doi: 10.1038/s42003-023-05349-1.

capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements

Affiliations

capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements

Xuemei Li et al. Commun Biol. .

Abstract

Transposable elements (TEs) serve as both insertional mutagens and regulatory elements in cells, and their aberrant activity is increasingly being revealed to contribute to diseases and cancers. However, measuring the transcriptional consequences of nonreference and young TEs at individual loci remains challenging with current methods, primarily due to technical limitations, including short read lengths generated and insufficient coverage in target regions. Here, we introduce a long-read targeted RNA sequencing method, Cas9-assisted profiling TE expression sequencing (capTEs), for quantitative analysis of transcriptional outputs for individual TEs, including transcribed nonreference insertions, noncanonical transcripts from various transcription patterns and their correlations with expression changes in related genes. This method selectively identified TE-containing transcripts and outputted data with up to 90% TE reads, maintaining a comparable data yield to whole-transcriptome sequencing. We applied capTEs to human cancer cells and found that internal and inserted Alu elements may employ distinct regulatory mechanisms to upregulate gene expression. We expect that capTEs will be a critical tool for advancing our understanding of the biological functions of individual TEs at the locus level, revealing their roles as both mutagens and regulators in biological and pathogenic processes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of capTEs.
a Schematic of the experimental workflow. The full-length ds-cDNA library was constructed from total RNA usng SMART technology, and cDNA ends were inactivated by ddGMP incorporation to block 3’ hydroxyl residues and dephosphorylation to remove 5’ phosphate residues. Then, new DNA ends were created by Cas9-gRNAs targeting specific sequences. After the release of Cas9 from the cleavage sites by the thermolabile protease, sequencing adapters were ligated to the cleavage sites for subsequent sequencing. b Histogram displaying the strand distribution of capTEs data in the targeted region of Alu gRNA. The x-axis shows the position where the read starts or ends. The underlined uppercase letters represent the PAM sequence, and the dashed line marks the Cas9 cut site. c Boxplot showing the strand ratio of capTEs data (n = 7). The box edges and whiskers indicate the median, upper and lower quartiles (the 25th and 75th percentiles) and 1.5 × interquartile range, respectively. d Bar plots showing side reaction rates of control (orange, n = 6), total RNA-seq (blue, n = 4) and capTEs (purple, n = 5). The side reaction rate is determined by the fraction of hybrid reads among all reads containing spike-in sequences. Error bars represent standard deviation. e Bar plot showing the data outputs of capTEs (orange, n = 3) and total RNA-seq (green, n = 3) relative to the control (gray–purple, n = 3), where the control is normalized to 1. Error bars represent standard deviation. f Stacked bar plots showing the state composition of available pores in control, capTEs and total RNA-seq: unoccupied pores (blue), adapter-occupied pores (orange) and DNA strand-occupied pores (green). The proportion (y-axis) is determined by the occupied time. df In the control, nCATS is directly applied to capture TE transcripts.
Fig. 2
Fig. 2. Enrichment efficiency of capTEs.
a Box plot showing the percentage of on-target reads (green, reads containing TEs of interest at the beginning 50 nt) and reads containing TEs of interest (blue) among all reads generated by capTEs (n = 7) and total RNA-seq (n = 3). The box edges and whiskers indicate the median, upper and lower quartiles (the 25th and 75th percentiles) and 1.5 × interquartile range, respectively. b Venn diagram showing the overlap between the target TE and young TE loci detected by capTEs and total RNA-seq methods. c Dot plot showing the coverage of target TEs in capTEs (x-axis) and total RNA-seq (y-axis) data. TEs detected by both capTEs and total RNA-seq are included in the analysis. RPT represents reads per TE. d Bar plots showing the subfamily proportions of target TEs detected by capTEs (orange) and total RNA-seq (green). e Bar plot showing the number of TE-containing transcripts identified with 6 Gb of capTEs (orange) and total RNA-seq (blue) data. f Saturation curves of TE transcript identifications using capTEs (orange) and total RNA-seq (blue). Black dashed lines indicate data requirements for 20,000 identifications.
Fig. 3
Fig. 3. Noncanonical transcripts and transcribed TE insertions identified with capTEs.
a Genome browser view showing two examples of noncanonical transcript assembly for capTEs and nCATS data. The blue rectangle marks the incomplete assembly of nCATS data. b Bar plots showing the number of noncanonical TE-containing transcripts detected by capTEs and total RNA-seq in various genomic regions. c Genome browser view and Sanger sequencing validation of each transcription pattern for TEs to generate noncanonical transcripts. The nucleotide signal displays the splicing junction. d Stacked bar plots showing the percentages of noncanonical transcripts with inner TEs (yellow), with terminal TEs (blue) and without TEs (gray) out of the various types of noncanonical transcripts identified by capTEs. Alu and L1 represent TEs targeted by designed gRNAs. e Genome browser view of TE insertions identified with capTEs and total RNA-seq. f Stacked bar plot showing the number of TE insertions identified by capTEs and total RNA-seq in the genic (purple) and intergenic (orange) regions. g Enrichment fold values of TE insertions in oncogenes. Significant * represents BH-adjusted P value < 0.05 reported by the Chi-squared test between insertions and all expressed TEs.
Fig. 4
Fig. 4. Assessing the ability of capTEs to the locus-specific measurement of TE expression.
a Boxplot showing the unique mapping rate of capTEs (n = 13) and long-read total RNA-seq (n = 3) data. b Scatter plot showing the correlation between the number of incorporated TE cDNA molecules (x-axis) and the number of reads detected with capTEs (y-axis). c Bar plot showing the number of TEs quantified with capTEs (blue), Telescope (orange) and TEtranscripts (gray). d Bar plots showing the subfamily proportions of target TEs detected by capTEs (blue), Telescope (orange) and TEtranscripts (gray). e Bar plot showing the number of young TEs quantified with capTEs (blue), Telescope (orange) and TEtranscripts (gray). The percentages within the parentheses represent the proportion of young TEs among all the target TEs quantified by the respective method. f Line chart showing the number of differential target TEs at various evolutionary ages identified with capTEs and Telescope. The red dashed line represents the cutoff for young TEs, which is set at 2 million years (Myr). g Scatter plots show the proportion of each TE subfamily among overexpressed (x-axis) and all detected target TEs (y-axis). The P value was determined by Fisher’s exact test between overexpressed target TEs and all expressed target TEs, and a significant change (solid red circle) was defined as BH-adjusted P < 0.05. h Genome browser view showing an example of measuring autonomous transcription levels of TEs at specific loci. The boxplot displays the expression levels of assembled transcripts and the colors indicate the transcripts where the analyzed TE was in autonomous (red), autonomous (orange) and passive (blue) transcription modes. The autonomous transcription level of this TE locus is represented by the total expression levels of the two transcripts labeled as “autonomous”. The expression levels are indicated by normalized read counts. i Histogram showing the number of TEs at various degrees of the independent promotion of transcription in breast cancer cells, ranging from 0 (passive) to 1 (fully autonomous). Passive transcribed TEs are not counted. j Scatter plots show the proportion of each TE subfamily in fully autonomously transcribed (x-axis) and all detected target TEs (y-axis). The P value was determined by Chi-squared test between fully autonomously transcribed target TEs and all expressed target TEs, and significant change (solid red circle) was defined as BH-adjusted P < 0.05. cf NGS data were analyzed using Telescope and TEtranscripts. a, h In boxplots, the box edges and whiskers indicate the median, upper and lower quartiles (the 25th and 75th percentiles) and 1.5 × interquartile range, respectively.
Fig. 5
Fig. 5. Transcriptional changes related to target TEs in cancer cells.
a Heatmap depicting the correlation among expression changes of target TEs, expression changes of TE-hosting genes and relative number of noncanonical transcripts in MDA-MB-231 cells compared to MCF 10 A cells. rs represents Spearman’s correlation coefficient. b Violin plot showing contributions of noncanonical transcripts (noncanonical) and canonical transcripts (reference) to the changes in the expression of TE-hosting genes in MDA-MB-231 cells compared to MCF 10 A cells. The contribution is the ratio of expression changes of the noncanonical or reference transcript to that of all transcripts in each locus. In the boxplot, the box edges and whiskers indicate the median, upper and lower quartiles (the 25th and 75th percentiles) and 1.5 × interquartile range, respectively. c Bar plot showing qPCR measurements of expression changes in FOXRED2 reference transcripts (gray, n = 6), noncanonical transcripts (red, n = 6) and all transcripts (blue, n = 6) in MDA-MB-231 cells compared to MCF 10 A cells. Error bars represent standard deviation. d Heatmaps depicting the expression levels of differential TEs in MDA-MB-231 and HCT 116 cells compared to their matched normal cells (n = 3), and density lines showing the frequency of TE-derived transcripts at each TE locus. e Circular heatmap showing following three-category contribution of each overexpressed TE loci to the expression levels of its host genes, autonomous transcription, intron retention and other passive transcription in MDA-MB-231 cells. Each ring represents one type of contribution. The enlarged portion (right) shows autonomously transcribed TEs that overlap with genes involved in cancer pathways. f Genome browser view showing examples of noncanonical TE-containing transcripts identified in MDA-MB-231 cells within cancer pathway genes, MAP2K2 and HIF1A. g Heatmap showing the relative frequency of TE insertions in cancer cells compared to matched normal cells (left) and transcriptional changes in TE-inserted genes (right). The color bar displays insertions in oncogenes (green). rs (Spearman’s correlation coefficient) between the relative frequency of TE insertions and changes in gene expression are 0.69 for MDA-MB-231 cells and 0.62 for HCT 116 cells. h Distribution of expressed inserted Alu (red) and expressed reference Alu (blue) in gene bodies. i Stacked bar plots showing the proportion of genomic features (3’UTR, 5’UTR, intron, CDS and intergenic regions) in expressed inserted Alu and expressed reference Alu, Alu Y, Alu S and Alu J.

Similar articles

Cited by

References

    1. Bourque G, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19:199. - PMC - PubMed
    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Mele M, et al. Human genomics. Hum. Transcript. Across Tissues Individ. Sci. 2015;348:660–665. - PMC - PubMed
    1. Percharde M, et al. A LINE1-Nucleolin Partnership Regulates Early Development and ESC Identity. Cell. 2018;174:391–405.e319. - PMC - PubMed
    1. Goke J, et al. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell. 2015;16:135–141. - PubMed

Publication types