Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Aug 22;3(9):RESEARCH0044.
doi: 10.1186/gb-2002-3-9-research0044. Epub 2002 Aug 22.

Computational discovery of sense-antisense transcription in the human and mouse genomes

Affiliations

Computational discovery of sense-antisense transcription in the human and mouse genomes

Jay Shendure et al. Genome Biol. .

Abstract

Background: Overlapping but oppositely oriented transcripts have the potential to form sense-antisense perfect double-stranded (ds) RNA duplexes. Over recent years, the number and variety of examples of mammalian gene-regulatory phenomena in which endogenous dsRNA duplexes have been proposed or demonstrated to participate has greatly increased. These include genomic imprinting, RNA interference, translational regulation, alternative splicing, X-inactivation and RNA editing. We computationally mined public mouse and human expressed sequence tag (EST) databases to search for additional examples of bidirectionally transcribed genomic regions.

Results: Our bioinformatics approach identified over 217 candidate overlapping transcriptional units, almost all of which are novel. From experimental validation of a subset of our predictions by orientation-specific RT-PCR, we estimate that our methodology has a specificity of 84% or greater. In many cases, regions of sense-antisense overlap within the 5'- or 3'-untranslated regions of a given transcript correlate with genomic patterns of mouse-human conservation.

Conclusions: Our results, in conjunction with the literature, bring the total number of predicted and validated examples of overlapping but oppositely oriented transcripts to over 300. Several of these cases support the hypothesis that a subset of the instances of substantial mouse-human conservation in the 5' and 3' UTRs of transcripts might be explained in part by functionality of an overlapping transcriptional unit.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Assessment of the quality of directional cloning of individual EST libraries. Bins of library quality scores (LQS) at intervals of 0.05 are depicted along the x-axis. The heights of the bars reflect the number of human ESTs derived from libraries with an LQS that falls in a given bin. The LQS of each EST library was determined by calculating the fraction of ESTs from a given library that were deposited in the same orientation as the best-of-UniGene (BOU) representative of the UniGene cluster to which a given EST belonged. In our initial analysis, we assumed that all BOU representatives were correctly oriented (blue bars). As this is not the case, we repeated that analysis by calculating the LQS exclusively from ESTs belonging to UniGene clusters where the BOU representative possessed a defined ORF, indicating that it was correctly oriented (red bars). As a final improvement, we flipped in silico all ESTs annotated as 3' sequencing reads, as these are generally not reoriented before deposit in sequence databases. The result was a bimodal distribution of LQS scores (green bars) that appears to correspond broadly with directional (peak near LQS = 1.0) and non-directional (peak near LQS = 0.5) library generation protocols. A full list of both mouse and human EST libraries and their LQS scores is available at our website [22].
Figure 2
Figure 2
Splicing and mouse-human conservation patterns for sense and antisense ESTs from UniGene cluster Hs.47313. The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Hs.47313. SIM4 [18] was used to map the exons of a single mRNA sequence (GenBank accession number NM_014785) and directionally cloned ESTs belonging to UniGene cluster Hs.47313 to genomic contig Hs9_28427_24 of the NCBI draft of the assembled human genome. The x-axis reflects base-pair positions along the genomic contig. Each position along the y-axis is assigned to a single EST or mRNA sequence. GenBank accession numbers are listed along with the UniLib ID of the library from which the EST was derived. Rectangular boxes indicate the locations of complete or partial exons. Individual exons of the BOU representative of this cluster (mRNA sequence NM_014785) are represented in blue and green, with annotated coding regions of the transcript shaded blue and untranslated regions shaded green. In this case, the mRNA is oriented from left to right with respect to the genomic contig. Immediately below the mRNA mapping, we have indicated the regions of the genome indicated to be highly conserved in HUMMUS [21], a set of around 1.15 million 'islands' of strong mouse-human conservation (in gold). The heights of individual bars in this row are proportional to the percent nucleotide identity over a 50-bp window centered on each base-pair. In the upper portion of the graph (all horizontal bars above the BOU mRNA sequence and HUMMUS rows), the exon mappings of sense ESTs are represented in yellow. In the lower portion of the graph (all horizontal bars below the BOU mRNA sequence and HUMMUS rows), exon mappings of antisense ESTs are represented in pink. Similar graphical representations for all 217 candidates (generated with GNUPLOT [27]) are available from our website [22]. The sense transcript (represented by the mRNA and sense ESTs) encodes KIAA0258, a protein of unknown function. Not unexpectedly, there is a strong correlation between the locations of sense transcript exons and the peaks in the strength of mouse-human conservation. It is also evident that the antisense ESTs are spliced in a consistent pattern that differs significantly from that of the mRNA and sense ESTs. This strengthens the claim that these represent a distinct RNA species inadvertently co-clustered into a single UniGene cluster by virtue of an antisense overlap. Observed regions of sense-antisense overlap are restricted to the 3' UTR of the sense transcript. Also striking is the observation that the islands of conservation in the 3' UTR of the BOU mRNA are largely coincident with the positions of exons of the putative antisense transcript, providing at least a potential explanation for the conserved elements observed in the 3' UTR of the sense mRNA. In this case, the antisense mRNA species does have strong homology to a known protein, suggesting that it is also a coding mRNA.
Figure 3
Figure 3
Splicing, mouse-human conservation patterns, and tissue origin of sense and antisense ESTs from UniGene cluster Hs.288835. (a) The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Hs.288835. Organization of the figure as for Figure 2. The BOU mRNA (GenBank accession NM_014430) is oriented from left to right with respect to genomic contig Hs14_19739_24 of the NCBI human genome assembly. The mRNA encodes CIDEB (cell-death inducing DFFA-like effector B). With no exceptions, the sense-oriented ESTs have splicing patterns that are consistent with that of the mRNA. The antisense ESTs, however, consistently overlap with intronic sequence of the sense transcript, suggesting that they are derived from a distinct RNA species (presumably unspliced, at least in the region that we are observing). (b) A plot of EST numbers in the CIDEB cluster against orientation. The y-axis indicates the number of sense or antisense ESTs observed in the CIDEB cluster, and the relative proportions arising from neoplastic versus non-neoplastic tissues are indicated. A significantly greater fraction of the antisense ESTs (34/46 = ~0.74) than the sense ESTs (3/15 = ~0.2) were derived from neoplastic tissues (p = ~0.0002 by chi-squared statistic).
Figure 4
Figure 4
Splicing and mouse-human conservation patterns for sense and antisense ESTs from UniGene cluster Hs.113916. The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Hs.113916. Organization of the figure as for Figure 2. The BOU mRNA (GenBank accession NM_032966) is oriented from left to right with respect to genomic contig Hs11_9491_24 of the NCBI human genome assembly. The mRNA encodes Burkitt lymphoma receptor 1, a GTP-binding protein. Although the transcript does not appear to be spliced, the sense ESTs terminate in a position consistent with that of the mRNA. Although the coding region of the sense transcript shows the highest degree of conservation between mouse and human, there are clearly islands of conservation within its 3' UTR. The antisense ESTs intersect with the most 3' portion of the sense transcript. They contain appropriately located polyadenylation signals, such that we are probably observing the 3' tail of an oppositely oriented transcript. The antisense ESTs have no protein homologies. It is worth noting that the most conserved stretch of the 3' UTR of the sense transcript is coincident with its region of overlap with the antisense RNA species.
Figure 5
Figure 5
Splicing and mouse-human conservation patterns for sense and antisense ESTs from UniGene cluster Mm.148209. The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Mm.148209. Organization of the figure as for Figure 2. The BOU mRNA (GenBank accession NM_011557) is oriented from left to right with respect to genomic contig GA_x5J8B7W5VG6 of the Celera mouse genome assembly. The mRNA encodes synaptonemal complex protein 3. The observed portion of the antisense species does not have protein-level homologies, and consistently overlaps a single internal coding exon of the sense transcript. Many of the antisense species are 3' reads containing an appropriately located poly(A) signal, suggesting that we are observing the 3' end of a larger transcript.
Figure 6
Figure 6
Splicing and mouse-human conservation patterns for sense and antisense ESTs from UniGene cluster Hs.125819. The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Hs.125819. Organization of figure as for Figure 2. Note that the BOU mRNA (GenBank accession NM_014473) is in this case oriented from right to left with respect to genomic contig Hs5_6844_24 of the NCBI human genome assembly. The mRNA encodes a putative dimethyladenosine transferase. Notably, however, there appear to be two potential termini for the antisense ESTs (which are oriented from left to right), suggesting that we are observing either alternative termini of a single transcript or two distinct antisense RNA species. One terminus is coincident with an island of mouse-human conservation within the 3' UTR of the sense transcript. The second is coincident with the last internal coding exon of the sense transcript. In both cases, the sequence near each putative terminus contains an appropriately located polyadenylation signal. The antisense ESTs have no significant protein homologies, and do not appear to be spliced. However, the ESTs that we are observing may represent only the 3' terminus of a larger coding transcript. Notably, the islands of conservation immediately 'upstream' of the antisense ESTs also have no protein homologies, suggesting that this may not be the case.
Figure 7
Figure 7
Splicing and mouse-human conservation patterns for sense and antisense ESTs from UniGene cluster Mm.10022. The graph depicts the exon-intron splicing structures of transcript sequences belonging to UniGene cluster Mm.10022. Organization of the figure as for Figure 2. The BOU mRNA (GenBank accession BC005773) is oriented from left to right with respect to genomic contig GA_x5J8B7W3T6H of the Celera mouse genome assembly. The mRNA is encoded by homer 3, a neuronal immediate early gene. The antisense species is homologous with a hypothetical human protein containing RNA helicase domains. This example is similar to Hs.47313 (Figure 2) in that the locations of strong mouse-human conservation in sub-regions within the 3' UTR of the sense transcript are coincident with the splicing structure of the antisense species.
Figure 8
Figure 8
Assessment of transcriptional directionality by RT-PCR. Sample results from (a) a control and (b) a sense-antisense candidate. PCR primers were designed to amplify predicted regions of bidirectional transcription. Control primers were designed to amplify either non-overlapping regions of candidate transcripts or randomly selected regions of non-candidate transcripts. For each candidate or control, four RT-PCR reactions were carried out using total human RNA from a single tissue as template. Orientation of transcripts was assessed by restricting which primer was present during RT single-strand synthesis. 1, Both primers present during RT single-strand synthesis (positive control); 2, only antisense orientation-specific primer present during RT single-strand synthesis; 3, only sense-orientation-specific primer present during RT single-strand synthesis; 4, neither primer present during RT single-strand synthesis (negative control for genomic contamination). L, 100 bp DNA ladder (Gibco-BRL). In all four reactions, both primers were present during the subsequent PCR reactions. In these examples, the control primers in (a) targeted a 127 bp region of 'chromosome condensation-related SMC-associated protein 1' (NM_014865; Hs.5719) over which we did not observe bidirectional transcription, and the candidate primers in (b) targeted a 113 bp region of mannose-6-phosphate receptor (cation dependent) (NM_002355; Hs.75709) which our results suggested was shared by an overlapping RNA species. The template in both cases is total human placental RNA (Clontech). In the control (a) only sense transcription is detected over the queried region (the appropriately sized band in lane 3). In the candidate (b) both antisense and sense transcription are detected (appropriately sized bands in lanes 2 and 3, respectively).

References

    1. Kumar M, Carmichael GG. Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes. Microbiol Mol Biol Rev. 1998;62:1415–1434. - PMC - PubMed
    1. Vanhee-Brossollet C, Vaquero C. Do natural antisense transcripts make sense in eukaryotes? Gene. 1998;211:1–9. - PubMed
    1. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. - PubMed
    1. Moore T, Constancia M, Zubair M, Bailleul B, Feil R, Sasaki H, Reik W. Multiple imprinted sense and antisense transcripts, differential methylation and tandem repeats in a putative imprinting control region upstream of mouse Igf2. Proc Natl Acad Sci USA. 1997;94:12509–12514. - PMC - PubMed
    1. Sleutels F, Zwart R, Barlow DP. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature. 2002;415:810–813. - PubMed