Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 5;5(8):e11970.
doi: 10.1371/journal.pone.0011970.

sRNAscanner: a computational tool for intergenic small RNA detection in bacterial genomes

Affiliations

sRNAscanner: a computational tool for intergenic small RNA detection in bacterial genomes

Jayavel Sridhar et al. PLoS One. .

Erratum in

  • PLoS One. 2010;5(9). doi: 10.1371/annotation/71408e55-e1d3-4950-9c3b-d3a3ad66a1ff. Narmada, Suryanarayanan Ramkumar [corrected to Sambaturu, Narmada]

Abstract

Background: Bacterial non-coding small RNAs (sRNAs) have attracted considerable attention due to their ubiquitous nature and contribution to numerous cellular processes including survival, adaptation and pathogenesis. Existing computational approaches for identifying bacterial sRNAs demonstrate varying levels of success and there remains considerable room for improvement.

Methodology/principal findings: Here we have proposed a transcriptional signal-based computational method to identify intergenic sRNA transcriptional units (TUs) in completely sequenced bacterial genomes. Our sRNAscanner tool uses position weight matrices derived from experimentally defined E. coli K-12 MG1655 sRNA promoter and rho-independent terminator signals to identify intergenic sRNA TUs through sliding window based genome scans. Analysis of genomes representative of twelve species suggested that sRNAscanner demonstrated equivalent sensitivity to sRNAPredict2, the best performing bioinformatics tool available presently. However, each algorithm yielded substantial numbers of known and uncharacterized hits that were unique to one or the other tool only. sRNAscanner identified 118 novel putative intergenic sRNA genes in Salmonella enterica Typhimurium LT2, none of which were flagged by sRNAPredict2. Candidate sRNA locations were compared with available deep sequencing libraries derived from Hfq-co-immunoprecipitated RNA purified from a second Typhimurium strain (Sittka et al. (2008) PLoS Genetics 4: e1000163). Sixteen potential novel sRNAs computationally predicted and detected in deep sequencing libraries were selected for experimental validation by Northern analysis using total RNA isolated from bacteria grown under eleven different growth conditions. RNA bands of expected sizes were detected in Northern blots for six of the examined candidates. Furthermore, the 5'-ends of these six Northern-supported sRNA candidates were successfully mapped using 5'-RACE analysis.

Conclusions/significance: We have developed, computationally examined and experimentally validated the sRNAscanner algorithm. Data derived from this study has successfully identified six novel S. Typhimurium sRNA genes. In addition, the computational specificity analysis we have undertaken suggests that approximately 40% of sRNAscanner hits with high cumulative sum of scores represent genuine, undiscovered sRNA genes. Collectively, these data strongly support the utility of sRNAscanner and offer a glimpse of its potential to reveal large numbers of sRNA genes that have to date defied identification. sRNAscanner is available from: http://bicmku.in:8081/sRNAscanner or http://cluster.physics.iisc.ernet.in/sRNAscanner/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Flowchart illustrating an overview of the sRNAscanner algorithm.
The final step was performed using the web-based TargetRNA utility and/or by comparison of sRNAscanner hits with RNA deep sequencing datasets. The output dataset obtained is shown as the red outlined box at the bottom of the figure. sRNAscanner hits supported by TargetRNA only are classed as possible sRNA candidates, whilst those supported by deep seqeuncing are considered as probable sRNA candidates. Details of parameter values used in this study are as indicated in the text.
Figure 2
Figure 2. Venn diagram showing the set of known E. coli K-12 MG1655 sRNA genes detected or missed by sRNAscanner.
The program was run using the training set-derived PWMs and parameters described in the text. The pale green elipse shown in dotted outline highlights the set of 66 known sRNA genes detected when the program was run without a CSS cut-off threshold. The darker green vertical oval indicates the set of 22 known sRNAs and a further 170 potentially novel intergenic sRNA detected using a CSS>14 cut-off. The sets of known E. coli K-12 MG1655 sRNA genes predicted bioinformatically by Wassarman et al. , Argaman et al. and Chen et al. are shown in blue-, red- and green-outline ovals, respectively. A further 61 sRNA genes identified through diverse experimental and bioinformatic means are shown in the yellow-outline oval.
Figure 3
Figure 3. Distribution of sRNAscanner cumulative sum of scores (CSS) for known sRNA and uncharacterized hits in E. coli K-12 MG1655.
The program was run using default parameters mentioned in the text. (A) The lower and top boundaries of the whisker plot boxes represent the 25th and 75th quartiles, respectively. The vertical lines extending from the boxes indicate the full range of the remaining CSS values with the exception of a single outlier, indicated as a cross, for the uncharacterized hits plot. (B) Histogram showing the CSS distributions of the two sets of sRNAscanner hits.
Figure 4
Figure 4. The three approaches used to estimate the specificity of sRNAscanner.
Conventional ROC (A) and normalized frequency distribution (B) plots were generated following analysis of the E. coli K-12 genome. The brown line in (A) denotes the point on the ROC curve which corresponds to CSS = 14. For these analyses, the set of 92 known sRNA were defined as the true positive set. Random matrices-based specificity analysis data are shown in panels (C), (D) and (E). (C) Histogram indicating the occurrence frequencies or predictions per nucleotide of intergenic hits with each of the three training set-derived matrices and the matching R1, R2 and R3 randomly shuffled versions of these matrices. The test genome sequence analysed was that of E. coli K-12 MG1655. (D) Graph showing the numbers of known MG1655 sRNA TU predicted by sRNAscanner within each of five CSS ranges plotted against the mid-point CSS value for the CSS range when the program was run with the training set-derived PWM or each of the three matching sets of random PWM in turn. (E) Bar graph showing the total numbers of hits (known and uncharacterized) predicted by sRNAscanner when the program was run with the training set-derived PWM and each of the matching random PWM. (F) Histogram showing the distribution of candidate ‘sRNA TUs’ predicted by length of sRNA within a composite sequence comprising concatenated intergenic sequences from E. coli K-12 (VIGS) and ten randomly suffled variants on this sequence (RIGS-1 – RIGS-10).
Figure 5
Figure 5. Venn diagram showing the numbers of known sRNAs in Salmonella Typhimurium LT2 that have been identified or reported by Pfeiffer et al. , Papenfort et al. and Rfam , Padalon-Brauch et al. and Sittka et al. , .
The circles shown in red dotted outline and green solid outline, excluding the central pale green curve-sided triangular area, indicate the numbers of known sRNAs predicted by sRNAscanner without and with the use of a CSS cut-off (CSS>14), respectively. The central pale green curve-sided triangular area, including the innermost circle outlined in purple, represents the 118 novel, intergenic, non-overlapping candidate sRNAs predicted in this study; the innermost circle outlined in purple represents the 16-member subset comprising sRNA candidates found to have likely mRNA transcripts by comparison with RNA deep sequencing datasets , . The $ superscript symbol indicates the five candidates belonging to both the Pfeiffer et al. and Sittka et al. , sets; the asterisk symbol denotes the one sRNA candidate mapping to the Padalon-Brauch et al. , Papenfort et al. and Sittka et al. , sets.
Figure 6
Figure 6. Total RNA was isolated from Salmonella Typhimurium SL1344 grown under eleven different conditions and subjected to Northern blotting using candidate sRNA-specific oligonucleotide probes.
Details of growth conditions examined are outlined in the Materials and Methods section. The curved arrows indicate the six putative Northern-detected transcripts mapping to loci predicted by sRNAscanner. Additional bands seen for sRNA3, sRNA6 and sRNA8, are believed to represent degradation and/or processed forms of cognate sRNAs or overlapping mRNA transcripts. The to-scale schematics shown below each gel image indicate sRNAscanner-predicted TUs (red/black/blue), deep sequencing identified transcripts (orange line) and 5′RACE-defined transcript start-sites (vertical black arrow). The yellow boxes indicate the probes used to detect transcripts by Northern blot experiments. Red boxes represent putative promoter sequences; blue boxes indicated putative terminator sequences.

Similar articles

Cited by

References

    1. Huang HY, Chang HY, Chou CH, Tseng CP, Ho SY, et al. sRNAMap: genomic maps for small non-coding RNAs, their regulators and their targets in microbial genomes. Nucleic Acids Res. 2009;37:D150–154. - PMC - PubMed
    1. Masse E, Majdalani N, Gottesman S. Regulatory roles for small RNAs in bacteria. Curr Opin Microbiol. 2003;6:120–124. - PubMed
    1. Silvaggi JM, Perkins JB, Losick R. Genes for small, non coding RNAs under sporulation control in Bacillus subtilis. J Bacteriol. 2006;188:532–541. - PMC - PubMed
    1. Vanderpool CK, Gottesman S. Involvement of a novel transcriptional activator and small RNA in post-transcriptional regulation of the glucose phosphoenolpyruvate phosphotransferase system. Mol Microbiol. 2004;54:1076–1089. - PubMed
    1. Masse E, Vanderpool CK, Gottesman S. Effect of RyhB Small RNA on Global Iron Use in Escherichia coli. J Bacteriol. 2005;187:6962–6971. - PMC - PubMed

Publication types

MeSH terms