Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Jan 4;102(1):11-26.
doi: 10.1016/j.ajhg.2017.11.002.

The Expanding Landscape of Alternative Splicing Variation in Human Populations

Affiliations
Review

The Expanding Landscape of Alternative Splicing Variation in Human Populations

Eddie Park et al. Am J Hum Genet. .

Abstract

Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A Primer on Alternative Splicing (A and B) Basic (A) and complex (B) patterns of alternative splicing. Dark-blue boxes represent constitutively spliced exons. Red, light-blue, and green boxes represent alternatively spliced exons. (C) Alternative splicing is regulated by an extensive protein-RNA interaction network involving cis elements within the pre-mRNA and trans-acting factors that bind to these cis elements. The most essential splicing signals within the pre-mRNA are the 5′ splice site (5′SS), 3′ splice site (3′SS), branch site (A), and polypyrimidine tract (Y(n)). The 5′ and 3′ splice sites have highly conserved GU and AG dinucleotides as the first and last two nucleotides of the intron, respectively. The U1 snRNP complex recognizes the 5′ splice site, and the U2 snRNP complex recognizes the branch site. The U2AF proteins recognize the 3′ splice site and polypyrimidine tract. Exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs), and intronic splicing silencers (ISSs) are pre-mRNA cis regulatory motifs that recruit various RNA-binding proteins (e.g., SR and hnRNP proteins) to regulate alternative splicing.
Figure 2
Figure 2
Strengths and Weaknesses of Short-Read and Long-Read RNA-Seq (A) Schematic diagram of an alternatively spliced gene that generates two distinct mRNA isoforms. The first, middle, and last exons are constitutive exons. The second and fourth exons are alternative exons. The two alternative exons are co-spliced such that the long isoform contains all five exons and the short isoform contains only the first, middle, and last exons. (B) Short-read RNA-seq generates many reads, enabling the accurate quantitation of individual alternative exons, but the long-range coupling between the two alternative exons is lost. (C) Long-read RNA-seq captures the long-range coupling between alternative exons and identifies the correct full-length mRNA isoforms, but the limited number of reads reduces the precision of isoform quantitation.
Figure 3
Figure 3
Strategies for Discovering Genetic Associations of Alternative Splicing (A) A population of individuals is genotyped, and their transcriptomes are subject to RNA-seq. (B) Splicing quantitative trait locus (sQTL) analysis. For a given exon, the splicing level (PSI value) is measured for each individual on the basis of RNA-seq reads aligned to distinct mRNA isoforms. The PSI values are treated as quantitative traits and tested for association with genotypes across all individuals for the identification of significant sQTLs. (C) Allele-specific alternative splicing (ASAS) analysis. Splicing levels (PSI values) are measured in an allele-specific manner for individuals who are heterozygous for a given SNP. For each individual, a PSI measurement can be obtained for each allele on the basis of allele-specific reads aligned to distinct mRNA isoforms. Reproducible allelic differences in PSI values across multiple heterozygous individuals provide evidence for significant ASAS events.
Figure 4
Figure 4
Two Examples of sQTLs Associated with GWAS Signals for Complex Diseases (A–C) Alternative splicing of SP140 exon 7 is associated with chronic lymphocytic leukemia, Crohn disease, inflammatory bowel disease, and multiple sclerosis. The alternative splicing event is an exon-skipping event. The C allele is associated with a higher level of exon inclusion, whereas the T allele is associated with a higher level of exon skipping. (A) Boxplot showing the significant association between SNP rs28445040 and the splicing level (PSI value) of SP140 exon 7 within the Geuvadis CEU (Utah residents with ancestry from northern and western Europe) population. Each dot represents the PSI value from a particular individual, and the size of each dot is proportional to the RNA-seq read coverage for the alternative splicing event in that individual. (B) Sashimi plot indicating the average RNA-seq read density and splice junction counts for each genotype. Exons and introns are not drawn to scale, and the relative width of exons is increased for clarity. (C) LD plot showing multiple GWAS SNPs (green boxes) linked with the sQTL SNP (purple box). (D–F) Alternative splicing of ERAP2 exon 10 is associated with Crohn disease, ulcerative colitis, inflammatory bowel disease, and birdshot chorioretinopathy. The alternative splicing event is an alternative 5′ splice site event. The A allele is associated with a higher level of the upstream canonical 5′ splice site, whereas the G allele is associated with a higher level of the downstream cryptic 5′ splice site. Usage of the downstream cryptic 5′ splice site introduces a premature stop codon and results in nonsense-mediated mRNA decay. (D) Boxplot showing the significant association between SNP rs2248374 and the splicing level (PSI value) of ERAP2 exon 10 (i.e., usage of the downstream cryptic 5′ splice site) within the Geuvadis CEU population. Each dot represents the PSI value from a particular individual, and the size of each dot is proportional to the RNA-seq read coverage for the alternative splicing event in that individual. (E) Sashimi plot indicating the average RNA-seq read density and splice junction counts for each genotype. Exons and introns are not drawn to scale, and the relative width of exons is increased for clarity. (F) LD plot showing multiple GWAS SNPs (green boxes) linked with the sQTL SNP (purple box). RNA-seq data of 89 CEU individuals are from the Geuvadis project. Sashimi plots were drawn with rmats2sashimiplot (see Web Resources). LD plots were drawn with Haploview 4.2 and include CEU individuals from the 1000 Genomes Project (phase 3). For each boxplot, the top and bottom of the box represent the third and first quartiles, respectively. The band in the middle of the box represents the median. The whiskers of each boxplot extend to the most extreme data points within 1.5 times the interquartile range from each box.
Figure 5
Figure 5
Experimental and Computational Tools for Characterizing the Causal Impacts of Genomic Variants on Alternative Splicing (A) Schematic diagram of a minigene splicing reporter. An exon of interest, along with its flanking intronic sequences, is inserted into a splicing reporter construct, where it is flanked by upstream and downstream exons containing a promoter and a polyA site. The splicing profile of the minigene splicing reporter can be determined by RT-PCR or RNA-seq. (B) Use of minigene splicing reporters for characterizing the effects of disease-causing variants or exonic and intronic splicing regulatory elements on splicing. (C) Minigene splicing reporters can be used in massively parallel reporter assays (MPRAs) for determining the consequences of many sequence variants on splicing in a high-throughput manner. A library of minigenes is transfected into a cell line, and splicing levels are measured for all variants simultaneously by RNA-seq. (D) Deep learning framework for analyzing alternative splicing. Starting with input data, including the genome sequence and RNA-seq data, the framework extracts genomic and RNA features. These features include diverse types of quantitative or qualitative features, such as conservation score, sequence motifs, secondary structure, and epigenetic marks. A computational model is trained to predict splicing patterns and levels by using the extracted features. The predictions can be evaluated with experimental validation (e.g., by RNA-seq, RT-PCR, or minigene).

References

    1. Sharp P.A. Split genes and RNA splicing. Cell. 1994;77:805–815. - PubMed
    1. Nilsen T.W., Graveley B.R. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463:457–463. - PMC - PubMed
    1. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed
    1. Djebali S., Davis C.A., Merkel A., Dobin A., Lassmann T., Mortazavi A., Tanzer A., Lagarde J., Lin W., Schlesinger F. Landscape of transcription in human cells. Nature. 2012;489:101–108. - PMC - PubMed
    1. Pan Q., Shai O., Lee L.J., Frey B.J., Blencowe B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008;40:1413–1415. - PubMed

Publication types