Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 26:2:25.
doi: 10.3389/fgene.2011.00025. eCollection 2011.

Systematic Curation of miRBase Annotation Using Integrated Small RNA High-Throughput Sequencing Data for C. elegans and Drosophila

Affiliations

Systematic Curation of miRBase Annotation Using Integrated Small RNA High-Throughput Sequencing Data for C. elegans and Drosophila

Xiangfeng Wang et al. Front Genet. .

Abstract

MicroRNAs (miRNAs) are a class of 20-23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5' and 3' variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.

Keywords: database curation; deep sequencing; microRNA.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Discrepancies between miRBase annotation and the actual miRNA sequences obtained from smRNA-seq data. (A) Fractions of different sizes of smRNA-reads mapped on each pre-miRNA sequence. Each column represents one miRNA sorted by its size based on miRBase annotated mature miRNA sequence. (B) Fractions of different 5′ nucleotides of smRNA-reads mapped on each pre-miRNA sequence. (C) Proportions of five dme-mir-2a-2 miRNA isoform sequences were consistent in different developmental stages. (D) The 24-nt dme-mir-2a is the preferential isoform associated with Ago proteins. (E) The 22-nt dme-mir-2b is the preferential isoform associated with Ago proteins.
Figure 2
Figure 2
Mature miRNA sequences are more precisely defined in C. elegans than in D. melanogaster. (A) Three groups of the corrected miRNAs based on the types of mis-annotated 5′ ends, 3′ ends and the strands of miRNA and miRNA*. (B) Box plots of the percentages of exact miRNA reads, miRNA* reads, miRNA 3′ and 5′ variant reads mapped on the pre-miRNA sequences in D. melanogaster and C. elegans. (C) Percentages of miRNA, miRNA*, miRNA 3′ and 5′ variant reads in D. melanogaster MC and LC miRNAs. (D) Percentages of exact miRNA* reads, miRNA* 3′ and 5′ variant reads in D. melanogaster and C. elegans.
Figure 3
Figure 3
Correction of dme-miR-2 family needs manual inspection of their miRNA/miRNA* duplexes. (A) The dme-mir-2a-1, dme-mir-2a-2, and dme-mir-c possess identical miRNA arms, on which the smRNA-seq reads were multiply mapped. The two numbers in the brackets are the count of smRNA-seq reads and the count of mapped locations in the genome for each isoform. (B) For the hairpin structures, we found that the 24-nt mir-2a isoform paired with mir-2a-1*, the 22-nt mir-2a isoform paired with mir-2a-2*, and 22-nt mir-2c isoform paired with mir-2c* to form the correct miRNA/miRNA* duplexes with 2-nt overhang at 3′ ends. (C) The proportions of the corrected miRNA and miRNA* strands for each miR-2 member across 14 samples. Each column represents one sample. The mir-2a-2 is the only exception that have equal amount of miRNA and miRNA* strands across all the samples. (D) Even mir-2a-2 and mir-2a-2* both existed in S2 cells, only the guide strands were associated with Ago proteins. (E) The alignment of corrected miR-2 family members. (F) The rebuilt developmental expression profile of D. melanogaster miR-2 family using the corrected miR-2 family sequences.
Figure 4
Figure 4
Correction of miR-6 family demonstrates the mis-annotated miRNA and miRNA* strands. (A) The corrected D. melanogaster miR-6 family sequences (red). The miRBase annotated miR-6 mature sequence is actually the miRNA* sequence (green).The corrected miR-6 members have distinct 5′ seed sequences differing at seventh and eighth nucleotides. (B) The hairpin structures of the three members of dme-miR-6 family show the miRNA* strands are conserved. (C) The expression profile rebuilt based on corrected mature sequences of miR-6 family. (D) The dme-mir-276a and 276b contain identical miRNA* arms but different miRNA arms with one nucleotide variation (marked by blue rectangles). (E) Hairpin structures of dme-mir-276a and 276b with highlighted miRNA* arm and miRNA arm. (F) Expression abundance of dme-mir-276a, mir-276b, and their passenger strand mir-276* in Drosophila development. (G) Mir-276* are preferentially associated with Ago2, while higher proportion of mir-276a was found in Ago1.
Figure 5
Figure 5
The 24-nt dme-miR-34 isoform is probably remodeled to shorter isoforms after its loading to Ago1. (A) The miR-34 pre-miRNA produces huge amount of miR-34 isoforms in D. melanogaster, and the 24-nt isoform is the miRBase annotated mature sequence. (B) The hairpin structure of dme-mir-34 shows that only the 24-nt isoform can form the correct duplex with miR-34*. (C) The miR-34 isoforms co-express during D. melanogaster development, but the 21-nt isoform is the highest one. (D) The expression patterns of two miR-34 isoforms in C. elegans are not significantly different. (E) Only 20- to 22-nt dme-mir-34 isoforms are associated with Ago proteins.
Figure 6
Figure 6
U and A are the preferential non-templated nucleotides added to mature miRNAs. (A) Both D. melanogaster and C. elegans miR-1 miRNAs contain non-templated nucleotide (marked in blue) extension at 3′ end. The frequency of the four types of 5′ and 3′ nucleotides based on corrected authentic miRNA sequences in D. melanogaster (B) and C. elegans (C). Frequency of the extended non-templated nucleotide based on the type of the last nucleotide (denoted by a red rectangle in miR-1 example) and the following nucleotide (denoted by a blue rectangle in miR-1 example) in D. melanogaster (D) and in C. elegans (E).
Figure 7
Figure 7
The more conserved miRNAs tend to co-express in multiple developmental stages. (A) Absolute expression abundance of MC and LC miRNA families in D. melanogaster and C. elegans mixed embryo samples. Each spot is a miRNA. (B) The cumulative fraction of D. melanogaster MC and LC miRNA families in the number of developmental stages. (C) The heat map of D. melanogaster MC and LC miRNA families across different developmental stages. The absolute miRNA expression abundances were normalized. (D) The binary expression status of D. melanogaster MC and LC miRNA families across different developmental stages. (E) The MC miRNAs in D. melanogaster have more predicted target sites than LC miRNAs. In this analysis, we selected the top 30 MC and top 30 LC miRNAs whose abundances are over 1,000 smRNA-seq reads. (F) Only 20% of the total targets of MC and LC miRNAs are overlapped.

Similar articles

Cited by

References

    1. Ahmed F., Ansari H., Raghava G. (2009). Prediction of guide strand of microRNAs from its sequence and secondary structure. BMC Bioinformatics 10, 105.10.1186/1471-2105-10-105 - DOI - PMC - PubMed
    1. Ameres S. L., Horwich M. D., Hung J. H., Xu J., Ghildiyal M., Weng Z., Zamore P. D. (2010). Target RNA-directed trimming and tailing of small silencing RNAs. Science 328, 1534–153910.1126/science.1187058 - DOI - PMC - PubMed
    1. Carthew R. W., Sontheimer E. J. (2009). Origins and mechanisms of miRNAs and siRNAs. Cell 136, 642–65510.1016/j.cell.2009.01.035 - DOI - PMC - PubMed
    1. Christodoulou F., Raible F., Tomer R., Simakov O., Trachana K., Klaus S., Snyman H., Hannon G. J., Bork P., Arendt D. (2010). Ancient animal microRNAs and the evolution of tissue identity. Nature 463, 1084–108810.1038/nature08744 - DOI - PMC - PubMed
    1. Chung W. J., Okamura K., Martin R., Lai E. C. (2008). Endogenous RNA interference provides a somatic defense against Drosophila transposons. Curr. Biol. 18, 795–802 - PMC - PubMed

LinkOut - more resources