Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 7;13(1):evaa233.
doi: 10.1093/gbe/evaa233.

Multiple Alu Exonization in 3'UTR of a Primate-Specific Isoform of CYP20A1 Creates a Potential miRNA Sponge

Affiliations

Multiple Alu Exonization in 3'UTR of a Primate-Specific Isoform of CYP20A1 Creates a Potential miRNA Sponge

Aniket Bhattacharya et al. Genome Biol Evol. .

Abstract

Alu repeats contribute to phylogenetic novelties in conserved regulatory networks in primates. Our study highlights how exonized Alus could nucleate large-scale mRNA-miRNA interactions. Using a functional genomics approach, we characterize a transcript isoform of an orphan gene, CYP20A1 (CYP20A1_Alu-LT) that has exonization of 23 Alus in its 3'UTR. CYP20A1_Alu-LT, confirmed by 3'RACE, is an outlier in length (9 kb 3'UTR) and widely expressed. Using publically available data sets, we demonstrate its expression in higher primates and presence in single nucleus RNA-seq of 15,928 human cortical neurons. miRanda predicts ∼4,700 miRNA recognition elements (MREs) for ∼1,000 miRNAs, primarily originated within these 3'UTR-Alus. CYP20A1_Alu-LT could be a potential multi-miRNA sponge as it harbors ≥10 MREs for 140 miRNAs and has cytosolic localization. We further tested whether expression of CYP20A1_Alu-LT correlates with mRNAs harboring similar MRE targets. RNA-seq with conjoint miRNA-seq analysis was done in primary human neurons where we observed CYP20A1_Alu-LT to be downregulated during heat shock response and upregulated in HIV1-Tat treatment. In total, 380 genes were positively correlated with its expression (significantly downregulated in heat shock and upregulated in Tat) and they harbored MREs for nine expressed miRNAs which were also enriched in CYP20A1_Alu-LT. MREs were significantly enriched in these 380 genes compared with random sets of differentially expressed genes (P = 8.134e-12). Gene ontology suggested involvement of these genes in neuronal development and hemostasis pathways thus proposing a novel component of Alu-miRNA-mediated transcriptional modulation that could govern specific physiological outcomes in higher primates.

Keywords: 3 prime UnTranslated Region (3′UTR) extension; Alu-miRNA; Cytochrome P450 20A1 (CYP20A1); miRNA recognition elements (MREs); multi-miRNA sponge; neurocoagulopathy.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
CYP20A1 contains a unique 3′UTR with Alu-driven divergence. (a) UCSC tracks representing the four transcript isoforms of CYP20A1 with varying 3′UTR length. Only isoform 1 (NM_177538) contains the full-length 8,932 bp 3′UTR (CYP20A1_Alu-LT). The RepeatMasker track shows this 3′UTR harbors 23 Alu repeats from different subfamilies. (b) Genome-wide analysis of length distribution of 3′UTR reveals CYP20A1_Alu-LT to be an outlier. Mean and median 3′UTR lengths were 1,553 and 1,007 bp, respectively. (c) Cladogram of CYP20A1 protein sequence divergence among different classes of vertebrates. At the protein level, this gene seems to have diverged minimally. Values within the parentheses represent branch length (unit: substitutions per site) (d) DNA level conservation analysis of 5′UTR and 3′UTR among 20 mammals reveals that 5′UTR is well conserved among all primate lineages, suggesting that divergence is unique to 3′UTR. Repeat masker track shows the position of Alu elements in the UTR region (also see supplementary fig. S3, Supplementary Material online).
Fig. 2
Fig. 2
CYP20A1_Alu-LT is expressed and may be a long noncoding RNA. (a) A schematic representation of the primers designed on the CYP20A1_Alu-LT to encompass 5′UTR and full-length 3′UTR. To check for full-length expression of transcript, cDNA from multiple cell lines of different tissue origin was used for amplification. Lanes-a–g are from Hela-S3, A549, HeLa, HEK293, MCF7, SK-N-SH, and gDNA (positive control), respectively. Representative gel images of this isoform expression, via amplification from the starting of 5′UTR and the end of 3′UTR, are shown by primer pairs 1 and 10, respectively. Amplicons (1, 5, and 10) were also confirmed by Sanger sequencing. (b) RT-qPCR for CYP20A1_Alu-LT expression in cancerous and noncancerous cell types of neuronal origin. Fold change was calculated with respect to SK-N-SH, after normalization with the geometric mean of expression values from β-actin, GAPDH, and 18S rRNA. The error bars represent the SD of three biological replicates and the average of three technical replicates were taken for each biological replicate (**P < 0.01; Student’s t-test). (c) 3′RACE confirms the expression of the full-length transcript. The schematic depicts the oligo(dT) (attached to a tag sequence) primed reverse transcription, followed by nested PCR. The amplification products corresponding to the bands below 900 bp and above 700 bp mapped to CYP20A1_Alu-LT 3′UTR, suggesting that the full-length transcript is expressed in untreated MCF-7 cells (n = 3). (d) Differentiating the CYP20A1_Alu-LTR transcript from other isoforms. The schematic in figure 1a highlights the skipped exon 6 and the position of flanking primers on shared exons in green color. The presence of at least two different types of transcripts was confirmed. A 277-bp amplicon corresponds to isoform(s) that contain exon 6 but have shorter 3′UTRs (isoforms 2 and 3 in e) and 196-bp amplicon corresponds to the long-3′UTR isoform (isoform 1). None of the six translation frames of the long 3′UTR isoform matches with the annotated protein. The amino acids marked in red are common to both isoforms 2 and 3, blue exclusive to isoform 3 and green represents the sequence from isoform 1. (e) Schematic representation summarizing the differences between CYP20A1 transcript isoform 1 (CYP20A1_Alu-LT) and isoforms 2 and 3.
Fig. 3
Fig. 3
CYP20A1_Alu-LT has the potential to act as a miRNA sponge. (a) Circos plot representing the MREs for the 994 miRNAs on CYP20A1_Alu-LT 3′UTR. miRNAs are grouped on the basis of the number of MREs. Twenty-three Alus in this 3′UTR contribute to 65% of its length and are distributed throughout the UTR. Only 11% of miRNAs have MREs > 10 (92 and 22 in G3 and G4, respectively). (b) Distribution of MREs for these 994 miRNAs on 1,000 random sets of 23 length and subfamily-matched Alu repeats. Only six sets contain MREs in range of 4,701–4,800 suggesting this is a nonrandom phenomenon and MREs are created post-Alu exaptations. Highlighted in green are sets with more than 4,500 MREs. (c) Proposed model to demonstrate the effect of potential sponge activity of CYP20A1_Alu-LT. In the condition where it is highly expressed, it will recruit multiple miRISC complexes which could relieve the repression of cognate targets leading to their translation, whereas in case of its reduced expression, those miRISC complexes remain free to load on the cognate targets and affect translational repression or promote mRNA degradation. CYP20A1_Alu-LT has the potential to sponge multiple miRNAs at the same time thereby regulating a large repertoire of transcripts.
Fig. 4
Fig. 4
Features of CYP20A1_Alu-LT for being a potential sponge RNA. (a) Cytosolic localization of CYP20A1_Alu-LT confirmed by RT-qPCR. Fold change was calculated with respect to total RNA, after internal normalization using the primers against spiked-in control. The error bars represent the SD of four independent experiments and the average of two technical replicates was used for each experiment. Quality controls for assessing the purity of cytosolic (GAPDH) and nuclear (MALAT1) fractions are also shown. The RT-qPCR data were analyzed in accordance with the MIQE guidelines (Bustin et al. 2009) (supplementary information S3, Supplementary Material online). (b) Late apoptotic cells in primary neurons and NPCs in response to HIV1-Tat treatment were scored by the number of TUNEL positive nuclei. Tat is neurotoxic and kills ∼50% more neurons compared with the vehicle control (VC, i.e., saline), whereas the difference is not statistically significant for NPCs (P values 0.04 and 0.21 for primary neurons and NPCs, respectively, for Student’s t-test assuming equal variance). The data represent the mean and SD of three independent experiments and >1,000 nuclei were scored per condition for each experiment. (c, d) Expression of CYP20A1_Alu-LT in response to HIV1-Tat (c) and heat shock (d) treatment was assessed by RT-qPCR using both 5′ and 3′UTR primers. The 3′UTR was found to be upregulated following 6 h recovery after Tat treatment in neurons (P value = 0.035; *P value < 0.05, Student’s t-test), but not in NPCs (P value = 0.348) (c). It was also strongly downregulated in neurons (P value = 0.031) immediately after heat shock (HS + 1 h recovery). This difference was not significant during recovery (P value = 0.310; HS + 3 h recovery) (d). In both these cases, the 5′UTR primer exhibits the same trend as the 3′UTR but does not qualify the statistical significance cutoff of P < 0.05. Fold change was calculated with respect to saline (vehicle) treatment, after internal normalization with the geometric mean of GAPDH, ACTB, and 18S rRNA in (c) and with respect to control (no heat shock treatment) cells, after internal normalization with the geomean of GAPDH and ACTB (d). The error bars represent the SD of three independent experiments and the average of 2–3 technical replicates was taken for each experiment.
Fig. 5
Fig. 5
Fold change (log2FC values) of 380 genes. (a) Figure represents log2FC of a set of 380 genes upregulated in response to Tat treatment (red) and downregulated during heat shock recovery (green) in primary neurons, resonating with the trend exhibited by CYP20A1_Alu-LT. All the transcripts contain one or more MREs for the nine miRNAs that can be potentially titrated by sponge activity of CYP20A1_Alu-LT in neurons. These represent potential cognate targets whose expression can be regulated by CYP20A1_Alu-LT perturbation. Genes are plotted in order as supplementary table S6, Supplementary Material online. (b) Enrichment of MRE sites in the 380 gene set compared with 1 million random sets of equal number of genes (Monte-Carlo simulations, P = 9.99999e-07). (c) The distribution of nine prioritized MREs on the CYP20A1 3′UTR, their overlap with Alu elements, and the MREs dense regions are shown. The orientation and subfamily of the 23 Alus present in this 3'UTR are also represented. (d) The heat map represents the top five biological processes targeted by each miRNA from pathway enrichment of 380 genes. Scale 0–5 is an arbitrary scale where 5 being the most targeted process.

Similar articles

Cited by

References

    1. An HJ, Lee D, Lee KH, Bhak J. 2004. The association of Alu repeats with the generation of potential AU-rich elements (ARE) at 3′ untranslated regions. BMC Genomics. 5(1):97. - PMC - PubMed
    1. Bakshi A, Herke SW, Batzer MA, Kim J. 2016. DNA methylation variation of human-specific Alu repeats. Epigenetics 11(2):163–173. - PMC - PubMed
    1. Batzer MA, Deininger PL. 1991. A human-specific subfamily of Alu sequences. Genomics 9(3):481–487. - PubMed
    1. Blanchette M. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4):708–715. - PMC - PubMed
    1. Boldog E, et al. 2018. Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type. Nat Neurosci. 21(9):1185–1195. - PMC - PubMed

Publication types