Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;10(4-5):479-97.
doi: 10.1504/IJBRA.2014.062996.

Discovering non-coding RNA elements in Drosophila 3' untranslated regions

Affiliations

Discovering non-coding RNA elements in Drosophila 3' untranslated regions

Cuncong Zhong et al. Int J Bioinform Res Appl. 2014.

Abstract

The Non-Coding RNA (ncRNA) elements in the 3' Untranslated Regions (3'-UTRs) are known to participate in the genes' post-transcriptional regulations. Inferring co-expression patterns of the genes through clustering these 3'-UTR ncRNA elements will provide invaluable insights for studying their biological functions. In this paper, we propose an improved RNA structural clustering pipeline. Benchmark of the new pipeline on Rfam data demonstrates over 10% performance improvements compared to the traditional hierarchical clustering pipeline. By applying the new clustering pipeline to 3'-UTRs of Drosophila melanogaster's genome, we have successfully identified 184 ncRNA clusters with 91.3% accuracy. One of these clusters corresponds to genes that are preferentially expressed in male Drosophila. Another cluster contains genes that are responsible for the functions of septate junction in epithelial cells. These discoveries encourage more studies on novel post-transcriptional regulation mechanisms.

Keywords: 3' Drosophila genome; RNA secondary structure; bioinformatics; clustering; co–expression patterns; gene expression; ncRNA clusters; non–coding RNA; post–transcriptional regulation; untranslated region.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The fitting of 500 5S rRNA similarity scores using different distributions. (a) Gumbel’s distribution; (b) general extreme value distribution; (c) Gamma distribution; (d) normal distribution. The Mean Square Error (MSE) is used to measure the goodness of fit. The general extreme value distribution can optimally model the local structural alignment scores
Figure 2
Figure 2
The pseudo-code for a single stage of the CLCL algorithm. At each stage, the heuristic algorithm tries to identify the clique with the largest size from the given unit-weighted, undirected graph. Notation: (vi, vj) denotes an edge connecting the vertices vi and vj; adj(vi) denotes the set of vertices that are adjacent to vertex v
Figure 3
Figure 3
F-measure and ROC curves for clique (C) and hierarchical (H) clustering pipeline at different p-value cut-offs. Red series: hierarchical clustering with Rfam data set by Will et al. (2007). Green series: clique clustering pipeline with Rfam data set. Blue series: clique clustering pipeline with Rfam_LowID data set. (a) F-measure of the clustering performance on different data sets. The peak performances of the three series are 64.8%, 74.9% and 86.4%, respectively (denoted by broken lines). Note that the cut-off used by Will et al. (2007) is recall rate, for which the corresponding p-value cut-off is difficult to estimate. Therefore, only the peak performance is presented. (b) ROC curves of clique and hierarchical clustering pipelines for different data sets. The term ‘before cluster’ refers to the performance of clustering before clique extraction (only score normalisation has been applied). The term ‘after cluster’ refers to the performance of clustering after clique extraction (both score normalisation and clique extraction have been applied). When the best overall performance is achieved (with corresponding FPR 8 × 10−3), the score normalisation contributes to the ~70% of the performance gain, while the clique extraction contributes the other ~30%
Figure 4
Figure 4
The expression profile of genes clustered in C19 and the consensus structure and multiple alignments of their conserved 3′-UTR RNA elements. (a) FlyAtlas expression levels of the genes clustered in C19 in different tissues. (This figure is generated by searching FlyMine with all genes that are clustered in C19.) A majority (11) of these genes are highly expressed in fly testis, while no similar pattern can be observed for the other tissues. (b) The ‘cup’ or ‘comet’ localisation patterns of four genes identified by 3′-UTR RNA clustering in fly testes. These four images were created in the laboratory of Dr. Helen White-Cooper, are copyright © Helen White-Cooper and were first published in FlyTED, the Drosophila Testis gene Expression Database (http://flyted.zoo.ox.ac.uk/), from which these copies were obtained. (c) The consensus secondary structure and multiple alignments of the 3′-UTR RNA elements of the four genes that are shown in (b) and two high-score hits that have been identified by searching the secondary structure profile against 3′-UTR of Drosophila melanogaster genome using cmsearch
Figure 5
Figure 5
The consensus secondary structure and multiple alignments of the 3′-UTR RNA elements of all six genes that have been clustered in C37

Similar articles

Cited by

References

    1. Altschul SF, Erickson BW. Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Molecular Biology and Evolution. 1985;2:526–538. - PubMed
    1. Auld VJ, Fetter RD, Broadie K, Goodman CS. Gliotactin, a novel transmembrane protein on peripheral glia, is required to form the blood-nerve barrier in Drosophila. Cell. 1995;81:757–767. - PubMed
    1. Backofen R, Tsur D, Zakov S, Ziv-Ukelson M. Sparse RNA folding: time and space efficient algorithms. Journal of Discrete Algorithms. 2011;9:12–31.
    1. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. - PubMed
    1. Barreau C, Benson E, Gudmannsdottir E, Newton F, White-Cooper H. Post-meiotic transcription in Drosophila testes. Development. 2008;135:1897–1902. - PubMed

Publication types

LinkOut - more resources