Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 15:11:488.
doi: 10.3389/fgene.2020.00488. eCollection 2020.

GC-AG Introns Features in Long Non-coding and Protein-Coding Genes Suggest Their Role in Gene Expression Regulation

Affiliations

GC-AG Introns Features in Long Non-coding and Protein-Coding Genes Suggest Their Role in Gene Expression Regulation

Monah Abou Alezz et al. Front Genet. .

Abstract

Long non-coding RNAs (lncRNAs) are recognized as an important class of regulatory molecules involved in a variety of biological functions. However, the regulatory mechanisms of long non-coding genes expression are still poorly understood. The characterization of the genomic features of lncRNAs is crucial to get insight into their function. In this study, we exploited recent annotations by GENCODE to characterize the genomic and splicing features of long non-coding genes in comparison with protein-coding ones, both in human and mouse. Our analysis highlighted differences between the two classes of genes in terms of their gene architecture. Significant differences in the splice sites usage were observed between long non-coding and protein-coding genes (PCG). While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in PCGs, we identified a significant enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we found a positional bias of GC-AG splice sites being enriched in the first intron in both classes of genes. Moreover, a significant shorter length and weaker donor and acceptor sites were found comparing GC-AG introns to GT-AG introns. Genes containing at least one GC-AG intron were found conserved in many species, more prone to alternative splicing and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Our study shows for the first time that GC-AG introns are mainly associated with lncRNAs and are preferentially located in the first intron. Additionally, we discovered their regulatory potential indicating the existence of a new mechanism of non-coding and PCGs expression regulation.

Keywords: GC-AG introns; alternative splicing; first intron; long non-coding RNAs; splice junctions.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Gene structure features of long non-coding and protein-coding genes (PCGs) in human and mouse. Boxplots showing the gene length (A,B), exon length (C,D), and intron length (E,F) in human and mouse, respectively. Data were presented as log10 of length in base pairs (bp). Exons were classified as first, inner or last and introns were classified as first or inner. ***p < 0.001.
FIGURE 2
FIGURE 2
Splice junctions strengths of the first introns. Schematic representation of the average scores of 5′ and 3′ss strengths of long non-coding and PCGs in human and mouse. The strengths of 5′ and 3′ ss were calculated as weight matrix scores for GC-AG and GT-AG first introns. ***p < 0.001.
FIGURE 4
FIGURE 4
Conservation of GC-AG introns across multiple species. Multiple sequence alignment of GC-AG splice sites in the first intron of ABI3BP gene and the intron 6 of NDUFAF6 gene across the 11 species indicated.
FIGURE 3
FIGURE 3
Expression of GC-AG- and GT-AG-containing transcripts. Bar graph representing the expression of lncRNAs and PCGs transcripts containing GC-AG- or GT-AG-introns and between transcripts containing a GC-AG intron in the first or inner position. The expression of transcripts was calculated as mean TPM combining expression data from 10 different tissues together. ***p < 0.001.
FIGURE 5
FIGURE 5
Functional enrichment analysis of GC-AG-containing genes. Bar graph representing the GO terms found significantly enriched in GC-AG containing PCGs. The GO term name is indicated on the Y-axis, and the (–)log10 of the p-values is indicated on the X-axis.

References

    1. Abril J. F., Castello R., Guigò R. (2005). Comparison of splice sites in mammals and chicken. Genome Res. 15 111–119. 10.1101/gr.3108805 - DOI - PMC - PubMed
    1. Adriaens C., Standaert L., Barra J., Latil M., Verfaillie A., Kalev P., et al. (2016). p53 induces formation of NEAT1 lncRNA-containing paraspeckles that modulate replication stress response and chemosensitivity. Nat. Med. 22 861–868. 10.1038/nm.4135 - DOI - PubMed
    1. Almada A. E., Wu X., Kriz A. J., Burge C. B., Sharp P. A. (2013). Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature 499 360–363. 10.1038/nature12349 - DOI - PMC - PubMed
    1. Anderson K. M., Anderson D. M., McAnally J. R., Shelton J. M., Bassel-Duby R., Olson E. N. (2016). Transcription of the non-coding RNA upperhand controls Hand2 expression and heart development. Nature 539 433–436. 10.1038/nature20128 - DOI - PMC - PubMed
    1. Andreassi C., Riccio A. (2009). To localize or not to localize: mRNA fate is in 3’UTR ends. Trends Cell Biol. 19 465–474. 10.1016/j.tcb.2009.06.001 - DOI - PubMed