Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 11;17(Suppl 10):786.
doi: 10.1186/s12864-016-3103-6.

Assisted transcriptome reconstruction and splicing orthology

Affiliations

Assisted transcriptome reconstruction and splicing orthology

Samuel Blanquart et al. BMC Genomics. .

Abstract

Background: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve - as has been strongly suggested in the literature - it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs.

Results: We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade.

Conclusions: Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.

Keywords: Eukaryotes; Splicing orthologs; Transcriptome prediction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Gene models and transcripts. In the CCDS Database [18], as of June 2016, there were 18 CREM human and 12 Crem mouse transcripts with unique splicing patterns. The common reference sequence has blocks labeled from A to U. The human gene model is at the top of the figure, followed by the sub-sequence of blocks and signals found in the mouse. The block representation and CCDS number is given for each transcript. Note that block O does not exist in the mouse gene, and block F is not in the human model, since block F is not found in any known human transcript. Of the 18 human transcripts, 15 are executable – meaning that they could be expressed –, and three are not (in red) because they use block O. The executable transcripts are further classified as found (5 of them, in black) and paired with a mouse transcript, or yet-to-be-found (10 of them, in green). All these predictions are confirmed by the controller. Since 5 mouse transcripts are correctly identified, and 12 mouse transcripts are currently known, the predictor successfully identifies 5/12, or 42 % of the mouse transcripts. As more mouse transcripts are discovered, this proportion may increase with future releases of the CCDS database
Fig. 2
Fig. 2
Alignment validation. Requirements for candidate protein sequences to be included in the controller list
Fig. 3
Fig. 3
Evolution of the number of splicing orthologs. These curves show the growth, over the years, of the number of known splicing orthologs among the subset of orthologous genes that has at least two different isoforms for human and for mouse in the CCDS Release 19. Each data point corresponds to a CCDS release of mouse transcripts: releases 2, 4, 7, 10, 13, 16 and 19. The black curve shows the growth of the whole subset; the blue curve shows the growth of splicing orthologs whose human transcript was known in 2006; the red curve shows the growth of splicing orthologs whose human transcript was discovered between 2006 and 2011; and the green curve shows the growth of splicing orthologs whose human transcript was discovered since 2011

Similar articles

Cited by

References

    1. Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using rna-seq. Nat Methods. 2011;8(6):469–77. doi: 10.1038/nmeth.1613. - DOI - PubMed
    1. Steijger T, Abril JF, Engstrom PG, Kokocinski F, Hubbard TJ, Guigo R, Harrow J, Bertone P, Abril JF, Akerman M, Alioto T, Ambrosini G, et al. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 2013;10(12):1177–84. doi: 10.1038/nmeth.2714. - DOI - PMC - PubMed
    1. Angelini C, De Canditiis D, De Feis I. Computational approaches for isoform detection and estimation: good and bad news. BMC Bioinforma. 2014;15(135):1–25. - PMC - PubMed
    1. Jänes J, Hu F, Lewin A, Turro E. A comparative study of rna-seq analysis strategies. Brief Bioinform. 2015;16(6):932–40. doi: 10.1093/bib/bbv007. - DOI - PMC - PubMed
    1. Hayer KE, Pizarro A, Lahens NF, Hogenesch JB, Grant GR. Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from rna-seq data. Bioinformatics. 2015;31(24):3938–45. - PMC - PubMed

Publication types

LinkOut - more resources