Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 12;3(9):e3197.
doi: 10.1371/journal.pone.0003197.

High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs

Affiliations

High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs

Jonathan Livny et al. PLoS One. .

Erratum in

  • PLoS ONE. 2008;3(11). doi: 10.1371/annotation/a03e1870-1dd7-4c16-8c46-2268eeb2a50a

Abstract

Background: Diverse bacterial genomes encode numerous small non-coding RNAs (sRNAs) that regulate myriad biological processes. While bioinformatic algorithms have proven effective in identifying sRNA-encoding loci, the lack of tools and infrastructure with which to execute these computationally demanding algorithms has limited their utilization. Genome-wide predictions of sRNA-encoding genes have been conducted in less than 3% of all sequenced bacterial strains, leading to critical gaps in current annotations. The relative paucity of genome-wide sRNA prediction represents a critical gap in current annotations of bacterial genomes and has limited examination of larger issues in sRNA biology, such as sRNA evolution.

Methodology/principal findings: We have developed and deployed SIPHT, a high throughput computational tool that utilizes workflow management and distributed computing to effectively conduct kingdom-wide predictions and annotations of intergenic sRNA-encoding genes. Candidate sRNA-encoding loci are identified based on the presence of putative Rho-independent terminators downstream of conserved intergenic sequences, and each locus is annotated for several features, including conservation in other species, association with one of several transcription factor binding sites and homology to any of over 300 previously identified sRNAs and cis-regulatory RNA elements. Using SIPHT, we conducted searches for putative sRNA-encoding genes in all 932 bacterial replicons in the NCBI database. These searches yielded nearly 60% of previously confirmed sRNAs, hundreds of previously annotated cis-encoded regulatory RNA elements such as riboswitches, and over 45,000 novel candidate intergenic loci.

Conclusions/significance: Candidate loci were identified across all branches of the bacterial evolutionary tree, suggesting a central and ubiquitous role for RNA-mediated regulation among bacterial species. Annotation of candidate loci by SIPHT provides clues into the potential biological function of thousands of previously confirmed and candidate regulatory RNAs and affords new insights into the evolution of bacterial riboregulation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic of SIPHT.
The two main stages of the SIPHT protocol are shown on the left. The two sets of non-interdependent programs in the workflow that are executed in parallel are denoted by shaded ovals. The steps in the workflow surrounded by dotted lines are not executed in every search but rather periodically to update local databases.
Figure 2
Figure 2. Influence of variations in search parameters on SIPHT predictions.
The search parameters in searches A–D become increasingly more stringent and are listed in the methods section.
Figure 3
Figure 3. Phylogram showing variations in the densities of predicted loci and in the conservation of known and candidate loci among diverse bacterial genera.
The phylogram is based on the 16S RNA sequences of a representative species in each genera. Gm+ and Gm− genera are colored blue and black, respectively. BLAST analyses were performed for known and candidate loci from E. coli (Ec), V. cholerae (VcI for chromosome I, VcII for chromosome II), and B. subtilis (Bs), which are colored red, green, and blue, respectively. Filled boxes denoted that the locus was predicted based on intergenic conservation in the indicated genera. Columns shaded gray and unshaded columns show results with BLAST E set to 1 e-15 and 1e-3, respectively.
Figure 4
Figure 4. Examples of conserved synteny between candidate loci and previously identified sRNAs.
Predicted loci are colored black; the previously annotated name or candidate number for each locus is indicated. ORF names are based on NCBI annotations and dashed lines connect homologous ORFs (BLAST E<1e-3). Additional candidate loci with conserved synteny are shown in Table S3.

References

    1. Gottesman S. Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet. 2005;21:399–404. - PubMed
    1. Storz G, Altuvia S, Wassarman KM. An abundance of RNA regulators. Annu Rev Biochem. 2005;74:199–217. - PubMed
    1. Livny J, Waldor MK. Identification of small RNAs in diverse bacterial species. Curr Opin Microbiol. 2007;10:96–101. - PubMed
    1. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005;33:D121–4. - PMC - PubMed
    1. Hershberg R, Altuvia S, Margalit H. A survey of small RNA-encoding genes in Escherichia coli. Nucleic Acids Res. 2003;31:1813–1820. - PMC - PubMed

Publication types