. 2011 Nov;39(21):e141.

doi: 10.1093/nar/gkr693. Epub 2011 Sep 2.

Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing

Anitha D Jayaprakash¹, Omar Jabado, Brian D Brown, Ravi Sachidanandam

Affiliations

PMID: 21890899
PMCID: PMC3241666
DOI: 10.1093/nar/gkr693

Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing

Anitha D Jayaprakash et al. Nucleic Acids Res. 2011 Nov.

. 2011 Nov;39(21):e141.

doi: 10.1093/nar/gkr693. Epub 2011 Sep 2.

Authors

Anitha D Jayaprakash¹, Omar Jabado, Brian D Brown, Ravi Sachidanandam

Affiliation

¹ Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, 1425 Madison Avenue, New York, NY 10029, USA.

PMID: 21890899
PMCID: PMC3241666
DOI: 10.1093/nar/gkr693

Abstract

Deep sequencing of small RNAs (sRNA-seq) is now the gold standard for small RNA profiling and discovery. Biases in sRNA-seq have been reported, but their etiology remains unidentified. Through a comprehensive series of sRNA-seq experiments, we establish that the predominant cause of the bias is the RNA ligases. We further demonstrate that RNA ligases have strong sequence-specific biases which distort the small RNA profiles considerably. We have devised a pooled adapter strategy to overcome this bias, and validated the method through data derived from microarray and qPCR. In light of our findings, published small RNA profiles, as well as barcoding strategies using adapter-end modifications, may need to be revisited. Importantly, by providing a wide spectrum of substrate for the ligase, the pooled-adapter strategy developed here provides a means to overcome issues of bias, and generate more accurate small RNA profiles.

PubMed Disclaimer

Figures

**Figure 1.**
The protocol for preparing samples for small RNA sequencing. Total RNA is size fractionated by denaturing poly acrylamide gel electrophoresis (PAGE) and miRNAs are excised from the gel using radiolabeled markers as guides. Purified small RNAs are ligated, using a truncated T4 RNA ligase 2 (Rnl2) in an ATP-free buffer, to a 17-nt modified 3′ DNA adapter with dideoxy at the 3′-end and activated at the 5′-end by adenylation. The dideoxy prevents self-ligation of the adapter, while the truncated ligase prevents circularization of the small RNA inserts. The ligated fragment of 36–41 nt is then PAGE purified, to remove the unligated 3′ adapters. A 32-nt RNA adapter is ligated to the 5′ side of the product using T4 RNA ligase 1 (Rnl1). The 72–78 nt ligated fragment is PAGE purified again to remove the unligated 5′ adapters. The product is reverse transcribed using a specific primer and the resulting cDNA is amplified by PCR with primers that incorporate sequences compatible with a deep-sequencing platform.

**Figure 2.**
Choice of 5′ adapter ends determines miRNA abundance/ranking, not PCR cycles. Sequencing libraries were constructed from total RNA derived from 293-T cells, using a pooled set of twelve 5′ adapters that had different 4-mer 3′-ends, shown on the x-axis. There is great diversity in the capture of individual miRNAs by different 5′ adapters (A, B and C show data for miR-18a, miR-20a and miR-106b, respectively). (C) Shows an extreme case where miR-106b is captured well by only one adapter, ending in AGCA, out of the 12 combinations. These data are consistently reproduced in other experiments shown in Figure 4. To isolate the effect of PCR cycles, we prepared the samples twice, using 25 (y-axis) and 18 (x-axis) cycles of PCR (D). Each point represents a miRNA. The correlation between the two sets is high (coefficient of 0.95) and the best linear fit to the points is a line of slope 1, suggesting that the data are reproducible and PCR is not responsible for the biases.

**Figure 3.**
Measured miRNA abundance by the fNN strategy depends on both the adapter and the miRNA sequences. (A) and (C) show the fraction of miRNA in each adapter type, calculated by adding the total number of miRNA sequences (irrespective of identity) captured by each adapter type as a fraction of the total amount of miRNA captured by all the adapters combined. (B) and (D) show, for each adapter type (only 5 out of the 16 are shown here for clarity), the fraction occupied by the top miRNAs. The rankings of the miRNAs by relative abundance are dependent on the adapter. The A and C panels show differences in adapter efficiencies in capturing miRNAs, and the B and D panels show that these differences arise from variations in the efficiencies that depend on the miRNA–adapter combination.

**Figure 4.**
The two terminal 3′ bases of the 5′ adapter are the primary determinants of T4-RNA ligase 1 (Rnl1) ligation efficiency. Two distinct sets of 5′ adapters, one consisting of adapters with mixed bases in the last four bases (fNNNN) and another consisting of adapters with mixed bases in the last two bases (fNN), were used to generate a miRNA derived cDNA library for (A) human 293T and (B) mouse embryonic stem cell lines. miRNA abundance in read counts (dots) were plotted; the fNNNN data were compressed to NN, by combining values for AANN through TTNN for each NN. The high correlation between the compressed fNNNN and the fNN datasets indicates that the two terminal bases are dominant determinants of ligation efficiency. There are exceptions shown in red, which are systematic differences (106b, 181 in 293T cell), which we detected in an independent experiment described in Figure 2 suggesting that this is not a stochastic effect. The naming convention in all our figures is to show the beginning and end of the sequence followed by an m (for a canonical mature) or n for a non-canonical miRNA sequence followed by the name of the miRNA. Thus in the left we have a canonical mature hsa-miR-106-b and a non-canonical hsa-miR-218. The high abundance for hsa-miR-106b suggested by the fNNNN strategy (in contrast to the low values suggested by fNN and other strategies) seems real, as the microarray and RT–PCR results (Figure 9) are in concordance with the fNNNN values.

**Figure 5.**
Synthetic RNA ligation to 3′ adapter is enhanced by using a pool of 3′ adapters with random NN at the 5′-end. Two RNA marker strands, 19 and 24-nt long, were synthesized. The 19-mer ends in UCGA, while the 24-mer has an extra 5 nt (AAUGU) on the 3′-end. The RNA markers were 5′-end-labeled with P32 and then ligated in duplicate to one of two sets adenylated 3′ DNA adapters; one set consisting of the standard adapter with a 5′ CTGT and the second set consisting of a mixture of adapters that differ from the standard adapter in having two extra mixed base positions on the 5′ side, with the start now becoming 5′ NNCTGT. After ligation, the RNA-DNA products were size fractionated on a 12% polyacrylamide gel. The 19 nt marker ligates efficiently, irrespective of the adapters used (lanes 5–8) while the ligated 24-mer product is low in abundance when the standard adapter is used (lanes 1–2), but is efficiently ligated (with abundant products) when the mixed-bases adapters are used (lanes 3–4).

**Figure 6.**
Fluctuation plots showing ligation efficiency for different fNN (A and C) and eNN (B and D) adapters against the most abundant miRNAs from 293T (A and B) and mES (C and D) cells. The naming convention in all our figures is to show the beginning and end of the sequence followed by an m (for a canonical mature) or n for a non-canonical miRNA sequence followed by the name of the miRNA. The area of the dark rectangles depicts the value for each combination of miRNA and adapter. The standard adapter ends (TC in fNN and CT in eNN, highlighted in gray boxes) are not very efficient in ligation to the most abundant miRNAs. Even the most efficient adapters show variability, suggesting that no single adapter can work well across all possible sequences. For the top miRNAs, most of the variability comes from the 3′ adapter ligation (the eNN adapters, B and D). In mES cells, there are two isomirs of mmu-miR-292-3p, the GT ending 3′ adapter captures the GAGT-ending isomir more efficiently, while the GA ending 3′ adapter captures the GAGTG-ending isomir more efficiently.

**Figure 7.**
Comparison of parameters inferred from fNN (A) and eNN (B) against fNN_eNN data. The rows are miRNAs captured by different methods, alternate rows are data from the fNN_eNN. In the figure, fTC_eNN means the f end was the standard (TC) and the e end was varied while fNN_eCT means e end had a CT and the f end was varied. In the data for fNN_eCT versus fNN, the ratio to the AG–CT combination is depicted for each row. For the comparison of fNN_eNN against eNN, the ratio to the values for the TC–GT combination is considered. The pairs are highlighted (either light or dark shaded rectangles), and the numbers between members of a pair are expected to be similar, as explained in the text. There is a striking similarity between pairs of rows, suggesting that the fNN_eNN parameters are in concordance with separate measurements of parameters with fNN and eNN. The results section has an explanation for the model on which the calculations are based.

**Figure 8.**
A radar plot showing the performance of different adapter termini combinations (fNN_eNN), shown outside the circle in blue. The inner circles represent percent contribution of each adapter combination to a particular miRNA that was sequenced. This plot shows data for the top miRNA (hsa-miR-20a) in 293T cells and two top miRNAs (mmu-miR-292-3p and mmu-miR-294) from mouse embryonic stem cells. There is large variation in the efficiency of capture between various combinations of 5′ and 3′ adapter end modifications. This emphasizes the need for a pooled strategy in sequencing.

**Figure 9.**
Comparison of sequencing against microarray (A and B) and RT–PCR (C and D) for mES (B and D) and 293T (A and C). There are outliers, such as miR-106b, which are only captured by the fNNNN strategy, but overall, there is significant correlation between the fNN_eNN strategy and the microarray data (A) and the fNN_eNN strategy and the RT–PCR data (C), while the fNN sequencing strategy does not give a good correlation to RT–PCR and array data (B and D).

**Figure 10.**
Comparison of rankings between the standard adapters (noNN,ranks along x-axis) versus fNN_eNN (ranks along y-axis) for 293T (A) and mES samples (B). A point above the diagonal represents a sequence that is overrepresented in noNN, while below the diagonal are points that are underrepresented in noNN. The hsa-miR-18a is overrepresented in the noNN case, where it is ranked 3, the array and qPCR data agree better with the fNN_eNN results which ranks it much lower [this skew is also seen in the mES samples, but the ranking in the noNN is 22 while the fNN_eNN is much lower (135)]. In the mES sample, mmu-miR-294 is first and a non-canonical form of mmu-mir-292-3p is second for noNN, while they switch ranks in the fNN_eNN case, the difference is very significant, because the abundances of the first and the second ranks are about 2-fold apart, suggesting a strong bias. mmu-miR-290-5p is very high at rank 5 in the case of noNN, it is outside the range of the graph in fNN_eNN, in accordance with the qPCR data. Thus, in every case that we can detect a difference between noNN and fNN_eNN, fNN_eNN seems to be more accurate in reflecting the profiles.

See this image and copyright information in PMC

References

1. Berezikov E, Thuemmler F, van Laake LW, Kondova I, Bontrop R, Cuppen E, Plasterk RHA. Diversity of microRNAs in human and chimpanzee brain. Nat. Genet. 2006;38:1375–1377. - PubMed
1. Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, et al. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 2010;24:992–1009. - PMC - PubMed
1. Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell. 2007;129:1401–1414. - PMC - PubMed
1. Lee LW, Zhang S, Etheridge A, Ma L, Martin D, Galas D, Wang K. Complexity of the microRNA repertoire revealed by next-generation sequencing. RNA. 2010;16:2170–2180. - PMC - PubMed
1. Aravin AA, Hannon GJ, Brennecke J. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science. 2007;318:761–764. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

DP2DK083052-01/DK/NIDDK NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing

Affiliation

Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources