Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003;4(7):R42.
doi: 10.1186/gb-2003-4-7-r42. Epub 2003 Jun 30.

Computational identification of Drosophila microRNA genes

Affiliations

Computational identification of Drosophila microRNA genes

Eric C Lai et al. Genome Biol. 2003.

Abstract

Background: MicroRNAs (miRNAs) are a large family of 21-22 nucleotide non-coding RNAs with presumed post-transcriptional regulatory activity. Most miRNAs were identified by direct cloning of small RNAs, an approach that favors detection of abundant miRNAs. Three observations suggested that miRNA genes might be identified using a computational approach. First, miRNAs generally derive from precursor transcripts of 70-100 nucleotides with extended stem-loop structure. Second, miRNAs are usually highly conserved between the genomes of related species. Third, miRNAs display a characteristic pattern of evolutionary divergence.

Results: We developed an informatic procedure called 'miRseeker', which analyzed the completed euchromatic sequences of Drosophila melanogaster and D. pseudoobscura for conserved sequences that adopt an extended stem-loop structure and display a pattern of nucleotide divergence characteristic of known miRNAs. The sensitivity of this computational procedure was demonstrated by the presence of 75% (18/24) of previously identified Drosophila miRNAs within the top 124 candidates. In total, we identified 48 novel miRNA candidates that were strongly conserved in more distant insect, nematode, or vertebrate genomes. We verified expression for a total of 24 novel miRNA genes, including 20 of 27 candidates conserved in a third species and 4 of 11 high-scoring, Drosophila-specific candidates. Our analyses lead us to estimate that drosophilid genomes contain around 110 miRNA genes.

Conclusions: Our computational strategy succeeded in identifying bona fide miRNA genes and suggests that miRNAs constitute nearly 1% of predicted protein-coding genes in Drosophila, a percentage similar to the percentage of miRNAs recently attributed to other metazoan genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
miRNA genes are isolated, evolutionarily conserved genomic sequences that have the capacity to form extended stem-loop structures as RNA. Shown are VISTA plots of globally aligned sequence from D. melanogaster and D. pseudoobscura, in which the degree of conservation is represented by the height of the peak. This particular region contains a conserved sequence identified in this study that adopts a stem-loop structure characteristic of known miRNAs. Expression of this sequence was confirmed by northern analysis (Table 2), and it was subsequently determined to be the fly ortholog of mammalian mir-184. Most conserved sequences do not have the ability to form extended stem-loops, as evidenced by the fold adopted by the sequence in the neighboring peak.
Figure 2
Figure 2
Classification of conserved stem-loop sequences. (a) Patterns of Drosophila pre-miRNA nucleotide divergence patterns imply a canonical progression in miRNA evolution. The Drosophila orthologs of 23/24 previously described miRNAs are either completely conserved (class 1), contain one or more mismatches or gaps located exclusively in the loop (class 2) or contain an equal or greater number of mutations within the loop compared to the non-miRNA-encoding arm (class 3). We consider these to represent successive steps in the normal evolution of miRNAs and therefore connect them with arrows. Members of classes 1-3 are considered as equally good candidates while members of classes 4-6 are poor candidates. As we expect class 3 candidates to eventually evolve into class 6 candidates (broken arrow), these evolutionary considerations are most relevant to species separated by an evolutionary distance comparable to D. melanogaster and D. pseudoobscura. (b) Preferential divergence of miRNAs within their loop sequences is illustrated by let-7. The Drosophila orthologs of let-7 contain three mismatches and one gap within the loop, whereas both arms have been completely conserved.
Figure 3
Figure 3
Overview of miRseeker, a computational strategy for identifying Drosophila miRNAs. See text for details.
Figure 4
Figure 4
Efficient selection of genuine miRNAs by miRseeker. (a) Distribution of the top 2,996 candidates binned by helical/free energy score (white bars), of which 570 passed subsequent conservation filters (green bars). 21/24 members of the reference set received a score of 16 or higher, and 20 of these passed the conservation filters. Note that these figures do not include mir-10, which did not fall in an aligned contig and was thus not analyzed, even though its miRseeker score is 18.45 and it passes conservation filters. (b) List of the top 124 miRseeker candidates; members of the reference set are highlighted in green, newly identified miRNAs from this study in blue, and additional third-species-conserved candidates in orange. The vast majority of the highest-scoring candidates are bona fide.
Figure 5
Figure 5
Diverse temporal and quantitative expression profiles of novel miRNAs by northern blotting. The three lanes represent 0-24-hour embryos (E), third instar larvae and 0-1-day pupae (L) and adult males (A), and hybridizing bands from the 21-24 nucleotide range are shown. (a-g) miRs with preferential expression at individual stages or a combination of two of these stages. (h-j) miRs that are expressed throughout development, either at uniform levels or in a graded fashion. (k) miR-1 was used as a control. Note that the blots shown were exposed for different lengths of time, so the relative levels of different miRNAs are not directly comparable; please refer to Table 2.
Figure 6
Figure 6
Example of a miRNA with false-negative evidence by northern blot (2R:11128979 = miR-137). In this example, four related sequences from four species of insects (Dm, D. melanogaster; Dp, D. pseudoobscura; Ag, A. gambiae; Am, A. mellifera) all adopt a phylogenetically conserved stem-loop structure. One arm has been perfectly preserved among all four species, and we presume that a miRNA is processed from within the conserved sequence (orange). Patterns of nucleotide divergence characteristic of miRNAs are seen, with more related sequences (Dm/Dp and Ag/Am) showing approximately equal amounts of divergence within the loop and along one arm, whereas the Dm/Dp vs Ag/Am comparison shows complete divergence within the loop (blue), with slightly less overall divergence along the putative non-miRNA-encoding arms. We deduce that a mature miRNA may initiate at one of the U residues that are highlighted by asterisks, as the first residue of the conserved region is found in the loop of the drosophilid hairpins and the second G residue is unfavored as the 5' residue of a miRNA. Northern analysis was negative using a probe complementary to the conserved region as well as with a probe identical to the conserved region (in the event that a miRNA is transcribed from the other strand). This sequence was only subsequently discovered to be orthologous to vertebrate miR-137 (which initiates at the second highlighted U). We consider other unverified predicted genes conserved in other insect species with similar characteristics to be potential candidates (see also Table 1).
Figure 7
Figure 7
Examples of Drosophila miRNA gene clusters. In this figure, pre-miRNAs are represented by rectangles and the arm that gives rise to the mature miRNA is colored. (a) The largest miRNA cluster was previously identified by Tuschl and colleagues [10]; we identified and experimentally verified a new member of this cluster, mir-286. A second conserved hairpin was found (light gray box), but its expression was not seen. Of the seven genes in this cluster, only mir-286 is conserved in Anopheles (ano). Note also that this cluster contains both related miRNA genes (mir-6-1, -2, -3 and the K-box antisense gene mir-5, yellow), as well as unrelated miRNA genes (black). (b) A second example of rapid miRNA gene evolution. The Anopheles genome contains four members of the mir-2/mir-13 family, which are all located in a single cluster. In contrast, drosophilid genomes contain eight members of this family, located at four distinct genomic locations on three different chromosomes. (c) A cluster of putative developmental regulators. let-7 and mir-125 are orthologous to the genetically characterized genes let-7 and lin-4 in C. elegans. A similar gene cluster exists in Anopheles, although mir-100 is separated from the other two by several kilobases (not shown). (d) Other examples of miRNA clusters. Note that, as is the case for the other clusters shown, miRNA clusters can contain related genes (yellow), but appear to be as likely to contain unrelated genes (black).

References

    1. Huttenhofer A, Brosius J, Bachellerie JP. RNomics: identification and function of small, non-messenger RNAs. Curr Opin Chem Biol. 2002;6:835–843. - PubMed
    1. Hannon GJ. RNA interference. Nature. 2002;418:244–251. - PubMed
    1. Ambros V. microRNAs: Tiny regulators with great potential. Cell. 2001;107:823–826. - PubMed
    1. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. - PubMed
    1. Reinhart BJ, Slack F, Basson M, Pasquinelli A, Bettinger J, Rougvie A, Horvitz HR, Ruvkun G. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature. 2000;403:901–906. - PubMed

Publication types

LinkOut - more resources