Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 3;15(6):1747-53.
doi: 10.1021/acs.jproteome.5b00883. Epub 2016 May 9.

Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning

Affiliations

Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning

Bharat Panwar et al. J Proteome Res. .

Abstract

The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc .

Keywords: ADAM15; DMXL2; IsoFunc; LMNA/C; RNA-seq; alternative splicing; functional annotation; gene ontology (GO); multiple instance learning (MIL); protein-coding splice variant (PCSV); support vector machine (SVM).

PubMed Disclaimer

Similar articles

Cited by

Publication types

Substances

LinkOut - more resources