Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 23;10(1):1069.
doi: 10.1038/s41598-020-57974-z.

ISOGO: Functional annotation of protein-coding splice variants

Affiliations

ISOGO: Functional annotation of protein-coding splice variants

Juan A Ferrer-Bonsoms et al. Sci Rep. .

Abstract

The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.es/app/isogo). Initial data, website link, isoform-specific GO function predictions and R code is available at https://gitlab.com/icassol/isogo.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overall proposal. Train and validation are performed with a train and a test set of genes respectively and the complete prediction model is built with the complete set of genes and finally it is applied to isoforms data achieving the final ISOGO matrix with [79,864 isoforms × 5,777 GO terms].
Figure 2
Figure 2
AUROC comparison, depending on the number of genes per GO term. Blue boxplots correspond to the Combination method, yellow ones to the Correlation method and grey ones to the Domain-based regression. Black boxplots correspond to the result in Panwar et al. A dotted black line is included to show the baseline for a random classifier (AUROC = 0.5).
Figure 3
Figure 3
AUPRC comparison, depending on the number of genes per GO term. Legend as for Fig. 2 (blue boxplots are combination method, yellow ones are Correlation method, grey ones are Domain-based regression and black ones are the result from Panwar et al.). The dotted black line represents the AUPRC of a random classifier. This value depends on the number of genes per category.
Figure 4
Figure 4
Panels (A,C,E) show heatmaps of the difference between the ISOGO and the expected logits of an isoform having a function, where larger values are represented in blue and smaller values in red. The x-axis of each heatmap picture displays the corresponding studied functions for each gene (Table S2). The functions are related to apoptosis in the case of BRCA1 and MADD -panels (A,C)- and related to exocytosis and SNARE machinery in the case of VAMP7 (panel E). Annotated and non-annotated functions are marked in green and orange respectively on the top of each heatmap. Panels (B,D,F) show the isoform structure and position of protein domains for BRCA1, MADD and VAMP7 respectively. Coding regions are marked in black while 5′ UTR and 3′UTR are colored in grey. Panels (C,D) show that isoforms that include both exons 13 and 16 -shaded blue- have larger logit for the GO functions. Panels (E,F) show that alternative splicing in VAMP7 changes the functions of SNARE machinery and exocytosis.
Figure 5
Figure 5
(A) Estimated logits for APPRIS and non-APPRIS isoforms. The y-axis displays the logits for all the isoforms of genes annotated to GO:0004629 (maturation of LSU-rRNA). APPRIS Transcripts are shown as red diamonds and other transcripts as blue circles. (B) Estimated logits of CAFA3 isoforms. In this case, the y-axis displays the logits for the genes annotated to GO:0004629 (phospholipase C activity). CAFA3 annotations as red diamonds and other transcripts as blue circles.
Figure 6
Figure 6
Screenshot of ISOGO web application main page. (A) Gene input and a brief description of it. (B) Checkbox list of the isoforms of the selected gene. (C) List of the genes annotated to the selected gene. (D) Option to add manually any GO term to the analysis. (E) Upper and lower thresholds set up. (F) hide/show list of filtered GO terms. (G) GO terms description and its isoforms gained and loss. (H) ISOGO table. (I) heatmap of the difference between the ISOGO values and the expected logits (J) Splice variants structure and protein domains position.

Similar articles

Cited by

References

    1. Marcel V, Hainaut P. p53 isoforms - A conspiracy to kidnap p53 tumor suppressor activity? Cell. Mol. Life Sci. 2009;66:391–406. doi: 10.1007/s00018-008-8336-3. - DOI - PMC - PubMed
    1. Gabut M, et al. An Alternative Splicing Switch Regulates Embryonic Stem Cell Pluripotency and Reprogramming. Cell. 2011;147:132–146. doi: 10.1016/j.cell.2011.08.023. - DOI - PubMed
    1. Romero JP, et al. EventPointer: An effective identification of alternative splicing events using junction arrays. BMC Genomics. 2016;17:467. doi: 10.1186/s12864-016-2816-x. - DOI - PMC - PubMed
    1. Wang J, et al. Computational methods and correlation of Exon-skipping events with splicing, transcription, and epigenetic factors. Methods Mol. Biol. 2017;1513:163–170. doi: 10.1007/978-1-4939-6539-7_11. - DOI - PMC - PubMed
    1. Li W, et al. High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res. 2014;42:e39–e39. doi: 10.1093/nar/gkt1362. - DOI - PMC - PubMed

Publication types

Substances