Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 1;7(8):giy093.
doi: 10.1093/gigascience/giy093.

Leveraging multiple transcriptome assembly methods for improved gene structure annotation

Affiliations

Leveraging multiple transcriptome assembly methods for improved gene structure annotation

Luca Venturini et al. Gigascience. .

Abstract

Background: The performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand.

Results: Here, we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artifacts such as erroneous transcript chimerisms.

Conclusions: We have implemented this method in an open-source Python3 and Cython program, Mikado, available on GitHub.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The algorithm employed by Mikado is capable of solving complex loci with multiple potential assemblies. This locus in A. thaliana is particularly challenging as an ancestral gene in the locus tandemly duplicated into the current AT5G66610, AT5G66620, and AT5G66630 genes. Due to these difficulties, no single assembler was capable of reconstructing all loci correctly. For instance, Trinity was the only method that correctly assembled AT5G66631, but it failed to reconstruct any other transcript correctly. The reverse was true for Cufflinks, which correctly assembled the three duplicated genes but completely missed the monoexonic AT566631. By choosing between different alternative assemblies, Mikado was capable to provide an evidence-based annotation congruent to the TAIR10 models.
Figure 2:
Figure 2:
Schematic representation of the Mikado workflow.
Figure 3:
Figure 3:
Performance of Mikado on simulated and real data. (A) We evaluated the performance of Mikado using both simulated data and the original real data. The method with the best transcript-level F1 is marked by a circle. (B) Number of reconstructed, missed, and chimeric genes in each assembly. Notice the lower level of chimeric events in simulated data.
Figure 4:
Figure 4:
Performance of Mikado while varying the MIF parameter. Precision/recall plot at the gene and transcript levels for CLASS and StringTie at varying minimum isoform fraction thresholds in A. thaliana, with and without applying Mikado. Dashed lines mark the F1 levels at different precision and recall values. CLASS sets MIF to 5% by default (red), while StringTie uses a slightly more stringent default value of 10% (cyan).
Figure 5:
Figure 5:
Integrating assemblies coming from multiple samples. (A) Mikado performs consistently better than other merging tools. StringTie-merge and TACO are not compatible with Trinity results and as such have not been included in the comparison. (B) Rate of recovered, missed, and fused genes for all the assembler and combiner combinations.

References

    1. Venturini L, Caim S, Kaithakottil GG et al. Mikado repository on GitHub; 2015. https://github.com/lucventurini/mikado/, Accessed 6 August, 2018.
    1. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics. 2011;12(1):323. - PMC - PubMed
    1. Roberts A, Pachter L. Streaming fragment assignment for real-time analysis of sequencing experiments. Nature Methods. 2013;10(1):71–73. - PMC - PubMed
    1. Patro R, Mount SM, Kingsford C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnology. 2014;32(5)462–464. - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P et al. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. 2016;34(5)525–527. - PubMed

Publication types