Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;34(10):2453-2468.
doi: 10.1093/molbev/msx212.

Isoform Evolution in Primates through Independent Combination of Alternative RNA Processing Events

Affiliations

Isoform Evolution in Primates through Independent Combination of Alternative RNA Processing Events

Shi-Jian Zhang et al. Mol Biol Evol. .

Abstract

Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875 bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.

Keywords: PacBio sequencing; alternative RNA processing event; comparative transcriptome; independent combination; isoform evolution; primate evolution.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
PacBio Iso-seq in human. (A) Distribution of the lengths of PA-trimmed PacBio reads. (B) Percentages of detectable genes (Discovery Rate) increased as the PacBio sequencing depth increased. (C) Count of genes at different read coverages. (D) Diagram illustrating the advantage of PacBio Iso-seq in deciphering transcript structure. With Illumina RNA-seq reads, alternative RNA processing events can only be inspected individually in a local view, whereas the association of distant events is indefinable. Iso-seq with single molecule, long-read sequencing provides a global view to investigate the combinatorial mode of distant alternative RNA processing events at the whole transcript level.
<sc>Fig</sc>. 2.
Fig. 2.
Genome-wide identification of alternative RNA processing events in human with PacBio Iso-seq. (A) Left panel: statistics for the five categories of alternative RNA processing events (SE, A5SS, A3SS, IR, and APA) and meta-data for PA sites contributing to the APA events. Annotated: alternative RNA processing events annotated by RefSeq; Novel: alternative RNA processing events not annotated by RefSeq. For SE, A5SS, A3SS, and IR, the frequency of different splicing motifs are shown in different colors according to the legend below. For APA events, the distance of the PA site from the transcription termination site (TTS) annotated in RefSeq is summarized in the boxplot. Right panel: frequencies of the top six Poly(A) signals located upstream of PA sites. (B) For each PacBio Iso-seq read, the numbers of splicing junctions were counted (PacBio Junction Count). The numbers of junctions supported by RNA-seq reads were also counted (Junction Count Supported by RNA-seq). The density distribution of reads were then summarized and shown in tile in the figure (PacBio Junction Count on X axis, Junction Count Supported by RNA-seq on Y axis), with the density indicated by color ranging from gray to red. (C) Two schemes to demonstrate the principles of PacBio Iso-seq in the identification of intron retention and alternative polyadenylation events.
<sc>Fig</sc>. 3.
Fig. 3.
Independent combination of alternative RNA processing events in human. (A) Upper panel: as proof-of-concept, the ADSL gene is shown to demonstrate the procedures used to study the combination of alternative RNA processing events. Five alternative RNA processing events were located on the ADSL gene: four AS events and one APA. The inclusion ratio for each AS event, as well as the PA frequency ratio for APA, are shown in brackets. Middle panel: structures of PacBio isoforms with PacBio read coverage highlighted on the left of the isoform structure. Lower left panel: distribution of expected isoform numbers generated by 10,000 iterations of Monte Carlo simulations; green bar indicates the number of isoforms observed in the real PacBio Iso-seq data. Lower right panel: frequency of expected isoforms versus real isoforms in PacBio Iso-seq. (B) Distribution of isoform numbers generated by 10,000 iterations of Monte Carlo simulation shown as a heat-map across each gene (X axis). Red curve: the observed isoform numbers from PacBio Iso-seq. (C) Distribution of Monte Carlo P values for the 2,242 human genes showing whether the observed isoform number was significantly lower than expectation. The distributions of corrected P values from the Exact Multinomial Test, Spearman correlation coefficients, and the Spearman correlation P values are also shown to indicate whether the frequencies of the expected isoforms are consistent with those for the observed isoforms in PacBio sequencing.
<sc>Fig</sc>. 4.
Fig. 4.
Validation of newly identified major novel isoforms. (A) 624 major isoforms annotated by RefSeq. Red curve, coverage of these isoforms from the PacBio Iso-seq data; blue points, coverage of 1,119 newly identified isoforms from the PacBio Iso-seq data. (B, C) Violin plots of the proportions of the newly identified isoforms, as well as the major isoforms as annotated by RefSeq. The proportions of isoforms were estimated on the basis of the inclusion ratio of alternative splicing events (B), or the sequencing coverage of PA sites (C). P values were calculated on the basis of the Wilcoxon signed-rank test. (D) Left: example of agarose electrophoresis showing ten newly identified major isoforms (dots) and ten RefSeq-annotated major isoforms (stars). Right: the coverage of these isoforms in the initial genome-wide Iso-seq and the targeted PacBio sequencing are summarized in the table. (E) The ratio of nucleotide diversity between nonsynonymous (Nsyn) sites and synonymous (Syn) sites. The ratios were calculated and shown for human annotated coding genes (Coding Genes), pseudogenes (Pseudogenes), as well as coding regions unique to novel isoforms (Novel Isoforms), as indicated.
<sc>Fig</sc>. 5.
Fig. 5.
Identification and verification of genes encoding isoforms specific to human. (A) Diagram showing the comparative transcriptome study in human on the basis of both PacBio Iso-seq and RNA-seq data. (B) Heatmap showing the status of the candidate human lineage-specific events in multiple human individuals and macaque animals. (C) Heatmap showing the status of the candidate human lineage-specific events in multiple tissues between human and rhesus macaque. Red: observed events; Gray: events not detected. (D) Cis-features for lineage-specific alternative RNA processing events in human. The splice site score of each lineage-specific splice site in human and rhesus macaque was calculated and shown. For lineage-specific A5SS and A3SS events detected in human only, the splice score in human (H_ANSS_Donor and H_ANSS_Acceptor) are significantly higher than that in rhesus macaque (R_ANSS_Donor and R_ANSS_Acceptor). For lineage-specific exon skipping events detected in human only, the splice score for 5′ and 3′ splice sites in human (H_SE_Donor and H_SE_Acceptor) are significantly higher than that in rhesus macaque (R_SE_Donor and R_SE_Acceptor). For lineage nonspecific exon skipping events detected in both human and rhesus macaque, the splice scores are generally comparable in the two species. (E) Validation of isoforms specific to human in PIGU gene using ultradeep targeted Iso-seq sequencing. Isoforms with relatively high sequencing depth (≥5) were shown. For each type of isoform identified by the Iso-seq, the structure of the isoform was shown with the number of the Iso-seq reads supporting it. The human lineage-specific exon was highlighted in red. Junction reads identified by RNA-seq were also shown to indicate the splicing junctions. (F) Functional implications of isoforms specific to human. (G) Boxplots of nucleotide diversity for nonsynonymous (Nsyn) and synonymous (Syn) sites in coding regions specific to novel isoforms specific to human.

References

    1. Ameur A, Wetterbom A, Feuk L, Gyllensten U.. 2010. Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol. 113: R34.. - PMC - PubMed
    1. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, Wong WH.. 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A. 11050: E4821–E4830. - PMC - PubMed
    1. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR.. 2015. A global reference for human genetic variation. Nature 5267571: 68–74. - PMC - PubMed
    1. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, et al. 2012. The evolutionary landscape of alternative splicing in vertebrate species. Science 3386114: 1587–1593. - PubMed
    1. Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D.. 2000. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 107: 1001–1010. - PMC - PubMed

Publication types