Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun;16(6):1256-67.
doi: 10.1261/rna.2038810. Epub 2010 Apr 26.

Noncanonical transcript forms in yeast and their regulation during environmental stress

Affiliations

Noncanonical transcript forms in yeast and their regulation during environmental stress

Oh Kyu Yoon et al. RNA. 2010 Jun.

Abstract

Surveys of transcription in many organisms have observed widespread expression of RNAs with no known function, encoded within and between canonical coding genes. The search to distinguish functional RNAs from transcriptional noise represents one of the great challenges in genomic biology. Here we report a next-generation sequencing technique designed to facilitate the inference of function of uncharacterized transcript forms by improving their coverage in sequencing libraries, in parallel with the detection of canonical mRNAs. We piloted this protocol, which is based on the capture of 3' ends of polyadenylated RNAs, in budding yeast. Analysis of transcript ends in coding regions uncovered hundreds of alternative-length coding forms, which harbored a unique sequence motif and showed signatures of regulatory function in particular gene categories; independent single-gene measurements confirmed the differential regulation of short coding forms during heat shock. In addition, our 3'-end RNA-seq method applied to wild-type strains detected putative noncoding transcripts previously reported only in RNA surveillance mutants, and many such transcripts showed differential expression in yeast cultures grown under chemical stress. Our results underscore the power of the 3'-end protocol to improve detection of noncanonical transcript forms in a sequencing experiment of standard depth, and our findings strongly suggest that many unannotated, polyadenylated RNAs may have as yet uncharacterized regulatory functions.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Schematic of strand-specific 3′-end RNA-seq. Blue and red represent transcripts originating from the sense (Watson) and reverse complement (Crick) strands relative to the reference genome, respectively. RNA (light colors) was fragmented and filtered for polyadenylated species. Reverse transcription (RT) was primed with an anchored oligo(dT) primer (NVTTT: N = A,C,T,G and V = A,C,G) to yield double-stranded complementary DNA fragments (dark blue and dark red). Illumina paired-end adapters (green) were ligated to cDNA ends and amplified by PCR. Both ends of each strand of cDNA were sequenced to generate a paired-end read pair (PE1, PE2) and the reads were mapped to the reference genome. Read pairs mapping in an orientation such that the poly(A) stretch appears at the end were inferred to have originated from the sense strand relative to the reference, and pairs mapping such that poly(T) appears at the front were inferred to have originated from the reverse complement strand relative to the reference.
FIGURE 2.
FIGURE 2.
The genomic distribution of mapped reads differs between 3′-end RNA-seq and standard mRNA-seq. Shown are the frequencies of uniquely mapped reads whose positions fell into the indicated genomic elements. Left, P1 3′-end RNA-seq library constructed as in Figure 1; right, mRNA-seq library constructed as in Nagalakshmi et al. (2008). For 3′-end RNA-seq, poly(A) (or poly(T)) positions were classified according to annotations on the same strand, while for mRNA-seq, start positions were classified without strand specificity. 5′ UTR definitions were taken from Nagalakshmi et al. (2008) and 3′ UTR lengths are defined in Materials and Methods. Noncoding, reads mapping to regions outside known ORFs, 5′ and 3′ UTRs, and known structural or regulatory RNAs.
FIGURE 3.
FIGURE 3.
Transcript ends cluster in gene elements. Each panel represents the frequency of genes containing dense clusters of transcript ends (transcript units, defined in Materials and Methods) from P1 3′-end RNA-seq, mapping to the indicated elements. (Left panel) 5′ UTRs; (middle panel) ORFs; (right panel) 3′ UTRs.
FIGURE 4.
FIGURE 4.
Transcriptional profiles of example genes and intergenic RNAs. In each panel, genome annotations are shown at top: yellow, ORFs; green, SUTs (Xu et al. 2009); blue, CUTs (Xu et al. 2009). The bottom four plots in each panel report raw RNA-seq data. Gray horizontal bars represent reads from P1 3′ RNA-seq libraries constructed as in Figure 1, and blue vertical bars represent histograms of data from standard mRNA-seq as in Nagalakshmi et al. (2008). The x-axis reports the start position of a given read and the y-axis reports log2 of the number of reads mapping to the indicated position. For 3′-end RNA-seq libraries, reads mapping to the Crick strand are assigned negative counts; poly(A) tails are drawn in blue and reverse complement poly(T) tails in red. Control, RNA from cultures grown in rich media; DTT, RNA from cultures treated with dithiothreitol. Major transcript forms for genes and putative noncoding RNAs are indicated by red arrows. Panels represent genomic regions containing (A) YMR061W/RNA14, (B) YPL240C/HSP82, (C) YER103W/SSA4, (D) SUT651, (E) CUT094, and (F) CUT373.
FIGURE 5.
FIGURE 5.
A consensus sequence motif at the 3′ ends of alternative-length coding transcripts. Each panel represents results from the set of 606 genes with the most abundant truncated coding forms in cells grown in rich media. (A) The motif enriched at the 3′ ends of truncated coding forms. The sequence logo was visualized using the WebLogo program (Schneider and Stephens 1990; Crooks et al. 2004). (B) Frequency of match scores to the matrix in A across ORFs (blue line) or 3′ UTRs (red line). The x-axis reports the score of the sequence window in the indicated feature with the best match to the matrix in A, and the y-axis reports the frequency of such windows with a given score. (C) Frequency of positions of sequence matches inside ORFs to the matrix in A. The x-axis reports the distance between the position of the sequence window with the best match to the motif and the position of the end of the truncated transcript observed in P1 3′-end RNA-seq (blue bars) or a randomly selected position inside the ORF (black dotted line); the y-axis reports the frequency of best-match windows with a given distance.
FIGURE 6.
FIGURE 6.
Transcript length forms change in heat shock. Shown are transcript abundances measured by quantitative PCR with two primer sets per gene, one at the annotated ORF end (interrogating the long form of the RNA) and the other at the position of an internal 3′ alternative transcript end (interrogating the short form) inferred from 3′-end RNA-seq data. The y-axis reports the log2 fold-change in abundance between the two amplicons in RNA from cultures grown at 30°C (dark gray) and at 37°C (light gray); negative values correspond to loci at which the short form of the RNA is more abundant than the long form. Cultures were grown to ∼3.5 × 107 cells/mL for RNA used to quantitate HSP82 and HSC82, and ∼1.0 × 107 cells/mL for all other genes.
FIGURE 7.
FIGURE 7.
Comparison of measures of expression changes of intergenic RNAs under dithiothreitol treatment. Each point represents an intergenic RNA with expression changes during DTT treatment. For each RNA, the y-axis reports the log2 fold-change of abundance measured by quantitative PCR using primers that interrogated the 3′ boundary of the feature. The x-axis reports the induction fold-change in DTT measured by P1 3′-end RNA-seq, defined as the log2 of the normalized sum of read counts in transcript units at the 3′ boundary of the feature in libraries from control samples, subtracted from the analogous quantity from DTT-treated cultures. Error bars are calculated as one standard deviation from four biological replicates.

Similar articles

Cited by

References

    1. Arigo JT, Carroll KL, Ames JM, Corden JL 2006. Regulation of yeast NRD1 expression by premature transcription termination. Mol Cell 21: 641–651 - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29 - PMC - PubMed
    1. Ausubel FM, Brent B, Kingston RE, Moore DD 1995. Current protocols in molecular biology Wiley, New York
    1. Bailey TL, Elkan C 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28–36 - PubMed
    1. Berger SL, Meselson M 1994. Production and cleavage of Drosophila hsp70 transcripts extending beyond the polyadenylation site. Nucleic Acids Res 22: 3218–3225 - PMC - PubMed

Publication types

MeSH terms