Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb;21(2):315-24.
doi: 10.1101/gr.107854.110. Epub 2010 Dec 22.

The Drosophila melanogaster transcriptome by paired-end RNA sequencing

Affiliations

The Drosophila melanogaster transcriptome by paired-end RNA sequencing

Bryce Daines et al. Genome Res. 2011 Feb.

Abstract

RNA-seq was used to generate an extensive map of the Drosophila melanogaster transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64-100-bp paired-end reads were generated on the Illumina GA II yielding 356× sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Deep RNA-seq covers 93% of the annotated D. melanogaster transcriptome. (A) 93.4% and 91.9% of the FlyBase and modENCODE annotated transcriptome is observed by pooled RNA-seq reads. Simulated read densities indicate that the current depth of sequencing approaches saturation. (B) Annotated transcripts are well covered by RNA-seq reads. More than 70% of annotated genes are covered for >95% of their length. (C) Most unobserved genes can be categorized into classes which are not expected to be observed due to size, genomic duplication, or lack of polyadenylation. (D) A novel exon is identified in wupA, which is supported by two novel junctions with 487 and 428 reads, respectively. (E) RT-PCR designed to validate the novel junction successfully amplifies an appropriately sized product from cDNA but not from genomic DNA.
Figure 2.
Figure 2.
RNA-seq detects 319 novel transcripts. (A) Novel transcripts detected by RNA-seq are identified on all chromosomes distributed as expected by chromosome size. (B) Novel transcripts (median size 315 bp) are much smaller than annotated FlyBase transcripts (median size 1560 bp). (C) The majority of novel transcripts identified in this study have two exons. (D) Clustering identifies many novel transcripts expressed at specific time points or in sex-specific patterns. For example, many novel transcripts are expressed in male but not female adults, although they are observed in mixed larva and pupa samples (Male). Other clusters express most abundantly in a specific stage (Larva). (E) A novel transcript is depicted, and the associated junction is supported by 50 sequenced reads. Experimental validation by RT-PCR was obtained in pupae cDNA (see Supplemental Fig. 2).
Figure 3.
Figure 3.
Sex-biased expression occurs in one-third of genes. (A) On average, 9995 genes are observed in each stage, 7214 genes are shared between all stages, and 12,490 genes are expressed in one or more stages. (B) 85.7% of genes exhibit greater than fourfold difference in expression between maximum and minimum expression time points. (C) The distribution of coefficient of variation calculated for each gene identifies genes whose expression is highly consistent across development. (D) Technical replicates of pupa RNA-seq exhibit nearly perfect correlation (R = 0.99). (E) Females and early embryos exhibit a high degree of correlation (R = 0.80). (F) 5251 exhibit fourfold difference in expression level between males and females: 1088 genes up-regulated in females are consistent with their expression in early embryos (red), suggesting their importance in embryonic development, 3486 genes are up-regulated in males (blue). (G) Genes without a known ortholog are abundantly expressed in males, many of which have known male-specific functions including seminal fluid proteins and male-specific transcripts. (H) Many genes on unassembled heterochromatin (chrU) exhibit male-specific expression. (I) Genomic PCR results suggest CG40968, CG40583, CG40992, and CG41561 are linked to chromosome Y.
Figure 4.
Figure 4.
RNA-seq detects abundant subtle alternate splice isoforms. (A) Distribution of novel splicing junctions near annotated exon–exon junctions approximates the frequency of read indel events. The distance between novel splice sites and annotated splice sites is plotted separately for canonical and noncanonical novel junctions. (B) Canonical novel junctions in coding regions conserve frame. The portion of novel junctions which conserve frame is calculated for all candidate junctions across two partions: canonical/noncanonical and coding/noncoding. Only canonical novel junctions within coding regions conserve frame more than expected by chance (Binomial test n = 34,060, P-value = 2.2 × 10−16). (C) Constitutively splicing NAGNAGs can be exonic (E) and intronic (I), named for where the second NAG is incorporated. Alternate splicing NANAGs result in both E and I states. (D) Constitutively splicing GYNGYNs can be of two forms: exonic (e) or intronic (i), named for where the first GYN is incorporated. Alternate splicing GYNGYNs result in both e and i states. Diagram adapted from Hiller et al. (2006). (E) For each possible NAGNAG splice site N1 (A, T, G, or C) and N2 (A, T, G, or C), the genomic count (circular diameter) and the proportion of exonic (E), intronic (I), and alternative (EI) splicing (circle color) was calculated.

References

    1. Ahsan B, Saito TL, Hashimoto S, Muramatsu K, Tsuda M, Sasaki A, Matsushima K, Aigaki T, Morishita S 2009. MachiBase: A Drosophila melanogaster 5′-end mRNA transcription database. Nucleic Acids Res 37: D49–D53 - PMC - PubMed
    1. Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP 2002. Gene expression during the life cycle of Drosophila melanogaster. Science 297: 2270–2275 - PubMed
    1. Campbell PJ, Stephens PJ, Pleasance ED, O'Meara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, et al. 2008. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet 40: 722–729 - PMC - PubMed
    1. Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A 2007. GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists. Genome Biol 8: R3 doi: 10.1186/gb-2007-8-1-r3 - PMC - PubMed
    1. Carvalho AB, Lazzaro BP, Clark AG 2000. Y chromosomal fertility factors kl-2 and kl-3 of Drosophila melanogaster encode dynein heavy chain polypeptides. Proc Natl Acad Sci 97: 13239–13244 - PMC - PubMed

Publication types

Associated data