Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Aug;30(8):340-7.
doi: 10.1016/j.tig.2014.05.005. Epub 2014 Jun 17.

The emerging era of genomic data integration for analyzing splice isoform function

Affiliations
Review

The emerging era of genomic data integration for analyzing splice isoform function

Hong-Dong Li et al. Trends Genet. 2014 Aug.

Abstract

The vast majority of multi-exon genes in humans undergo alternative splicing, which greatly increases the functional diversity of protein species. Predicting functions at the isoform level is essential to further our understanding of developmental abnormalities and cancers, which frequently exhibit aberrant splicing and dysregulation of isoform expression. However, determination of isoform function is very difficult, and efforts to predict isoform function have been limited in the functional genomics field. Deep sequencing of RNA now provides an unprecedented amount of expression data at the transcript level. We describe here emerging computational approaches that integrate such large-scale whole-transcriptome sequencing (RNA-seq) data for predicting the functions of alternatively spliced isoforms, and we discuss their applications in developmental and cancer biology. We outline future directions for isoform function prediction, emphasizing the need for heterogeneous genomic data integration and tissue-specific, dynamic isoform-level network modeling, which will allow the field to realize its full potential.

Keywords: cancers; development; function prediction; genomic data integration; splice isoforms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Five basic mechanisms of alternative splicing. (B) Current statistical numbers of human genes and transcripts based on GENCODE annotation (version 19). lnc genes: long non-coding genes; snc genes: small non-coding genes; lnc transcripts: long non-coding transcripts; NMD transcripts: non-sense mediated decay transcripts. The count of genes with different number of transcripts is plotted; on average a gene has three transcripts annotated in GENCODE.
Figure 2
Figure 2
Methods for isoform function prediction and future perspectives. Current protein structure-based methods take amino acid sequence, secondary or tertiary structure as input for predicting isoform functions. The modeling methods include but are not limited to domain-based methods and homology modeling. Multiple instance learning (MIL) is a recently developed method for predict isoform functions by integrating isoform-level feature data (e.g. RNA-seq) and gene-level functional annotation data. For a given function under study, green circles represent positive genes associated with this function; blue rectangles represent negative genes not having this function; each small element in genes represents an isoform. For a positive gene, at least one of the isoforms must carry out the function (red). For a negative gene, none of its isoforms should carry out the function under study (white). In terms of general terminology of MIL, genes and its isoforms are called bags and instances, respectively. Future perspectives of the isoform-level functional analyses include integrating heterogeneous genomic data with additional predictive values, networks for better understanding of isoform-isoform interaction or isoform-phenotype associations, and tissue and developmental stage-specific predictions.
Figure 3
Figure 3
Structural differences of functionally different isoforms of the same gene. A. The upper part shows the gene models of the two isoforms NM_001040654.1 and NM_009877.2 of the Cdkn2a gene in mouse. The computationally modeled 3D structure of NM_001040654.1 in mouse is characterized by ankryin repeats. In contrast, the 3D structure of NM_009877.2 has a CDKN2a N-terminus domain. Adapted from [16]. B. Superimposed, predicted 3D structures of the two isoforms of the Anxa6 gene. Due to the absence of “VAAEIL” residues (existing in NM_013472.4) in the NM_001110211.1 variant (red), the loop region is smaller. The structural alignment shows that the “TPS” residues are not aligned between the Anxa6 variants (inset box), which could affect the post-translational modification of Thr and Ser residues (as shown in green and red spheres in the inset figure). Adapted from [18].

Similar articles

Cited by

References

    1. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2007;72:291–336. - PubMed
    1. Barash Y, et al. Deciphering the splicing code. Nature. 2010;465:53–59. - PubMed
    1. Ferreira E, et al. Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts. BMC Genomics. 2010;11:S4. - PMC - PubMed
    1. Modrek B, et al. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acid Res. 2001;29:2850–2859. - PMC - PubMed
    1. Griffith M, et al. Alternative expression analysis by RNA sequencing. Nat Methods. 2010;7:843–847. - PubMed

Publication types

LinkOut - more resources