Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun;19(6):740-51.
doi: 10.1261/rna.035279.112. Epub 2013 Apr 22.

ShortStack: comprehensive annotation and quantification of small RNA genes

Affiliations

ShortStack: comprehensive annotation and quantification of small RNA genes

Michael J Axtell. RNA. 2013 Jun.

Abstract

Small RNA sequencing allows genome-wide discovery, categorization, and quantification of genes producing regulatory small RNAs. Many tools have been described for annotation and quantification of microRNA loci (MIRNAs) from small RNA-seq data. However, in many organisms and tissue types, MIRNA genes comprise only a small fraction of all small RNA-producing genes. ShortStack is a stand-alone application that analyzes reference-aligned small RNA-seq data and performs comprehensive de novo annotation and quantification of the inferred small RNA genes. ShortStack's output reports multiple parameters of direct relevance to small RNA gene annotation, including RNA size distributions, repetitiveness, strandedness, hairpin-association, MIRNA annotation, and phasing. In this study, ShortStack is demonstrated to perform accurate annotations and useful descriptions of diverse small RNA genes from four plants (Arabidopsis, tomato, rice, and maize) and three animals (Drosophila, mice, and humans). ShortStack efficiently processes very large small RNA-seq data sets using modest computational resources, and its performance compares favorably to previously described tools. Annotation of MIRNA loci by ShortStack is highly specific in both plants and animals. ShortStack is freely available under a GNU General Public License.

Keywords: bioinformatics; microRNA; next-generation sequencing; siRNA; small RNA; software.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Overview of ShortStack. (A) Flow chart describing the inputs and six phases of analysis performed by ShortStack. (B) Illustration of ShortStack’s cluster definition method with a minimum depth of four alignments. Aligned small RNAs in dark shading are tallied within the final cluster.
FIGURE 2.
FIGURE 2.
Secondary structural characteristics of known MIRNAs from miRBase (version 19). (A) Cumulative fraction of the number of base pairs in the hairpins of miRBase MIRNAs, separated by kingdom. Dotted lines and text indicate default parameter settings for ShortStack 0.4.1 for the indicated miRtypes. (B) As in A for the fraction of stem nucleotides paired. (C) As in A for the ΔG per stem nt. (D) As in A for loop lengths. (E) As in A for the number of unpaired mature miRNA nt in the miRNA/miRNA* duplex. (F) Fraction of miRBase hairpins for the indicated kingdoms that pass the default ShortStack structural criteria for miRType animal (Metazoa, Viruses) or plant (Viridiplantae).
FIGURE 3.
FIGURE 3.
Performance analysis of small RNA clustering by ShortStack. (A) Receiver operating characteristic (ROC) curves based on ShortStack analyses of the col_leaf data set in --nohp mode, using the indicated values of options --pad and --mindepth. (TPR) True positive rate, (FPR) false positive rate. Filled triangle indicates default settings. (B) Number of small RNA loci from the col_leaf data set annotated by ShortStack with the indicated settings of --pad and --mindepth. Filled circle indicates default settings.
FIGURE 4.
FIGURE 4.
General features of small RNA-producing genes from seven species. (A) Numbers of ShortStack-annotated small RNA genes by data set and DicerCall. Data sets are described in Table 2. The DicerCall indicates the predominant size of small RNA produced by a gene. DicerCalls of “N” are given to loci where <80% of the mapped small RNAs were outside of the allowable size range. (B) Small RNA abundances by data set and DicerCall. (RPM) Reads per million. (C) Fractions of genes of the indicated DicerCalls annotated as hairpin-associated (HP), MIRNAs, or nonstructured loci (None). (D) Fractions of small RNA abundance values for genes of the indicated DicerCalls derived from HP, MIRNA, or nonstructured loci (None).
FIGURE 5.
FIGURE 5.
ShortStack annotates diverse types of small RNA genes. (A) Cumulative fraction of all col_leaf small RNA genes with a DicerCall of 21 that were subjected to phasing analysis, ranked by adjusted P-value. The ShortStack-generated P-values were adjusted using the Benjamini-Hochberg procedure to control for the false discovery rate (FDR) during multiple testing; the dotted line indicates an FDR of 0.05. Points in red represent ShortStack clusters that overlap a list of Arabidopsis loci previously described to produce phased small RNAs. (B) As in A, for the col_aerial data set. (C) As in A, for the tomato data set, using a list of tomato loci previously described as producing phased small RNAs. (D) Scatterplot of hairpin length as a function of small RNA abundance for all loci annotated as hairpin-associated (HP) from the col_leaf data set. Dotted lines indicate cutoffs for long hairpins (>500 nt) and highly abundant (>100 reads per million). Two known loci, IR71 and IR2039, are labeled. This ShortStack analysis increased the value of option --pad from the default value of 100 to 300. (E) As in D, except for the col_aerial data set. (F) ShortStack-annotated small RNA genes from the fly_ovary analysis that overlap a set of known endo-siRNA loci from Drosophila ovaries, grouped by DicerCall and secondary structure. (G) As in F, except for a set of known Drosophila piRNA loci. (H) As in F, except for the mouse_testes analysis with respect to a set of previously described murine piRNA loci. (I) ShortStack-generated polarity assignments for small RNA genes overlapping previously annotated sets of piRNA loci from Drosophila and mouse. (ds) Double-stranded.
FIGURE 6.
FIGURE 6.
ShortStack’s MIRNA annotations are highly specific in both plants and animals. (AH) Area-proportional elliptical Euler diagrams depicting overlaps between loci annotated as MIRNAs by miRBase (version 19), ShortStack, and miRDeep2 for the indicated data sets. Numbers indicate the MIRNA locus counts for each sector. miRBase loci are restricted to those for which at least one small RNA corresponding to the known mature miRNA was mapped to its corresponding locus in the data set; these are the “miRBase mature present” loci. Diagrams were rendered by eulerAPE (http://www.eulerdiagrams.org/eulerAPE/) version 2.0.3. (I) Sensitivity of MIRNA annotation by ShortStack and miRDeep2 by data set. (J) Numbers of false positive MIRNA annotations by ShortStack and miRDeep2 by data set.

References

    1. Allen E, Xie Z, Gustafson AM, Carrington JC 2005. microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121: 207–221 - PubMed
    1. Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, Morris P, Brownstein MJ, Kuramochi-Miyagawa S, Nakano T, et al. 2006. A novel class of small RNAs binds to MILI protein in mouse testes. Nature 442: 203–207 - PubMed
    1. Barber WT, Zhang W, Win H, Varala KK, Dorweiler JE, Hudson ME, Moose SP 2012. Repeat associated small RNAs vary among parents and following hybridization in maize. Proc Natl Acad Sci 109: 10444–10449 - PMC - PubMed
    1. Berezikov E, Robine N, Samsonova A, Westholm JO, Naqvi A, Hung J-H, Okamura K, Dai Q, Bortolamiol-Becet D, Martin R, et al. 2011. Deep annotation of Drosophila melanogaster microRNAs yields insights into their processing, modification, and emergence. Genome Res 21: 203–215 - PMC - PubMed
    1. Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, Hannon GJ 2007. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128: 1089–1103 - PubMed

Publication types

Substances

LinkOut - more resources