Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb;20(2):257-64.
doi: 10.1101/gr.095273.109. Epub 2010 Jan 5.

Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries

Affiliations

Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries

Michiel J L de Hoon et al. Genome Res. 2010 Feb.

Abstract

MicroRNAs (miRNAs) are short (20-23 nt) RNAs that are sequence-specific mediators of transcriptional and post-transcriptional regulation of gene expression. Modern high-throughput technologies enable deep sequencing of such RNA species on an unprecedented scale. We find that the analysis of small RNA deep-sequencing libraries can be affected by cross-mapping, in which RNA sequences originating from one locus are inadvertently mapped to another. Similar to cross-hybridization on microarrays, cross-mapping is prevalent among miRNAs, as they tend to occur in families, are similar or derived from repeat or structural RNAs, or are post-transcriptionally modified. Here, we develop a strategy to correct for cross-mapping, and apply it to the analysis of RNA editing in mature miRNAs. In contrast to previous reports, our analysis suggests that RNA editing in mature miRNAs is rare in animals.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Number of mapping locations. The number of mapping locations for all FANTOM4 THP-1 short RNA sequences. More than half of the short RNAs in these libraries map to more than one genome location.
Figure 2.
Figure 2.
Cross-mapping in short RNA sequencing libraries. Sequences in green and blue represent the genome sequence and the RNA sequence, respectively. Mismatches between the genome sequence and the RNA sequence are shown in red. A miRNA sequence with an additional 3′ adenosine, either due to post-transcriptional addition (as discussed in the text) or a sequencing error, maps equally well to the genome loci encoding the human miRNAs let-7b and let-7c. This miRNA sequence was read 20 times in the FANTOM4 time course short RNA libraries. Dividing the sequence counts equally between these two genome loci leads to a spurious RNA editing site in let-7c. Alternatively, cross-mapping to unannotated genome regions may give rise to spurious novel noncoding RNA loci.
Figure 3.
Figure 3.
Cross-mapping correction strategy. For each short RNA that can be aligned to multiple genome regions with an equal number of errors, our strategy to correct for cross-mapping assigns weights to each candidate mapping location based on the local expression level as well as the alignment errors. The latter is based on the error profile describing the probability of an alignment error as a function of the position along the alignment. Both the local expression level and the error profile are calculated from the mapped RNAs themselves using an expectation-maximization algorithm. First, we assign equal weights to all candidate mapping locations. We then calculate the error profile and the expression level of each genome location from the complete set of mapped RNA sequences. This allows us then to recalculate the error profile and the expression levels. This process is iterated until convergence.
Figure 4.
Figure 4.
Effect of the cross-mapping correction on mapping weights. The weight ratio is defined as the weight calculated by the cross-mapping correction strategy divided by the corresponding weight under an equal-weight strategy (see text for more detail). The more a weight ratio differs from unity, the larger the cross-mapping correction. (A) The cumulative distribution of the weight ratios, revealing that most weights are reduced in comparison to the equal-weight strategy, while a few weights are greatly increased. (B) A two-dimensional binning plot of the count of each short RNA sequence and the weight ratio, using a logarithmic color scheme to represent the number of mapping events in each bin.
Figure 5.
Figure 5.
Comparison of miRNA expression before and after cross-mapping correction. This scatter plot shows the expression of miRNAs before and after correcting for cross-mapping. With a Spearman correlation coefficient of 0.99, the cross-mapping correction has a minor effect on the estimated expression of most miRNAs. However, for 14 miRNAs the relative difference between the estimated expression with and without the cross-mapping correction was larger than 50%. These miRNAs are circled and shown together with their short RNA counts before and after the cross-mapping correction.
Figure 6.
Figure 6.
Verification of the cross-mapping correction at spurious editing sites. Our analysis showed that eight out of 10 sites with overrepresented mismatches in mature miRNAs are due to cross-mapping rather than true RNA editing sites (Table 1). For each of these miRNAs, we identified the most abundant RNA that mapped exactly (i.e., without mismatches) to the miRNA locus and the most abundant RNA that mapped to the putative origin of cross-mapping. We also identified the most abundant RNA mapping to the miRNA locus with one mismatch at the site of overrepresented mismatches. For this RNA, we calculated the Spearman correlation along the time course of its counts with the counts of the RNA derived from the miRNA locus (light gray), and the Spearman correlation along the time course of the RNA derived from the putative origin of cross-mapping (dark gray). In all cases, we find that the putative origin of cross-mapping yields a stronger correlation than the miRNA site, supporting our conclusion that the mismatched RNA originates from the cross-mapping locus rather than the miRNA locus.

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Babiarz JE, Ruby JG, Wang Y, Bartel DP, Blelloch R. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes & Dev. 2008;22:2773–2785. - PMC - PubMed
    1. Bartel DP. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;36:D25–D30. - PMC - PubMed
    1. Blow MJ, Grocock RJ, Van Dongen S, Enright AJ, Dicks E, Futreal PA, Wooster R, Stratton MR. RNA editing of human microRNAs. Genome Biol. 2006;7:R27. doi: 10.1186/gb-2006-7-4-r27. - DOI - PMC - PubMed

Publication types

LinkOut - more resources