Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning
- PMID: 27142340
- DOI: 10.1021/acs.jproteome.5b00883
Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning
Abstract
The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc .
Keywords: ADAM15; DMXL2; IsoFunc; LMNA/C; RNA-seq; alternative splicing; functional annotation; gene ontology (GO); multiple instance learning (MIL); protein-coding splice variant (PCSV); support vector machine (SVM).
Similar articles
-
ISOGO: Functional annotation of protein-coding splice variants.Sci Rep. 2020 Jan 23;10(1):1069. doi: 10.1038/s41598-020-57974-z. Sci Rep. 2020. PMID: 31974522 Free PMC article.
-
Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project.J Proteome Res. 2015 Sep 4;14(9):3484-91. doi: 10.1021/acs.jproteome.5b00494. Epub 2015 Aug 11. J Proteome Res. 2015. PMID: 26216192 Free PMC article.
-
A Network of Splice Isoforms for the Mouse.Sci Rep. 2016 Apr 15;6:24507. doi: 10.1038/srep24507. Sci Rep. 2016. PMID: 27079421 Free PMC article.
-
Current trend of annotating single nucleotide variation in humans--A case study on SNVrap.Methods. 2015 Jun;79-80:32-40. doi: 10.1016/j.ymeth.2014.10.003. Epub 2014 Oct 13. Methods. 2015. PMID: 25308971 Review.
-
In silico and in cellulo approaches for functional annotation of human protein splice variants.Biomed Khim. 2024 Sep;70(5):315-328. doi: 10.18097/PBMC20247005315. Biomed Khim. 2024. PMID: 39324196 Review.
Cited by
-
Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering.Metab Eng Commun. 2020 Dec;11:e00149. doi: 10.1016/j.mec.2020.e00149. Epub 2020 Oct 9. Metab Eng Commun. 2020. PMID: 33072513 Free PMC article. Review.
-
ISOGO: Functional annotation of protein-coding splice variants.Sci Rep. 2020 Jan 23;10(1):1069. doi: 10.1038/s41598-020-57974-z. Sci Rep. 2020. PMID: 31974522 Free PMC article.
-
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.Methods Mol Biol. 2017;1558:415-436. doi: 10.1007/978-1-4939-6783-4_20. Methods Mol Biol. 2017. PMID: 28150250 Free PMC article.
-
Short- and long-term effects of radiation exposure at low dose and low dose rate in normal human VH10 fibroblasts.Front Public Health. 2023 Dec 15;11:1297942. doi: 10.3389/fpubh.2023.1297942. eCollection 2023. Front Public Health. 2023. PMID: 38162630 Free PMC article.
-
IsoResolve: predicting splice isoform functions by integrating gene and isoform-level features with domain adaptation.Bioinformatics. 2021 May 1;37(4):522-530. doi: 10.1093/bioinformatics/btaa829. Bioinformatics. 2021. PMID: 32966552 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous