Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;56(9):1851-1861.
doi: 10.1038/s41588-024-01872-x. Epub 2024 Sep 2.

Global impact of unproductive splicing on human gene expression

Affiliations

Global impact of unproductive splicing on human gene expression

Benjamin Fair et al. Nat Genet. 2024 Sep.

Abstract

Alternative splicing (AS) in human genes is widely viewed as a mechanism for enhancing proteomic diversity. AS can also impact gene expression levels without increasing protein diversity by producing 'unproductive' transcripts that are targeted for rapid degradation by nonsense-mediated decay (NMD). However, the relative importance of this regulatory mechanism remains underexplored. To better understand the impact of AS-NMD relative to other regulatory mechanisms, we analyzed population-scale genomic data across eight molecular assays, covering various stages from transcription to cytoplasmic decay. We report threefold more unproductive splicing compared with prior estimates using steady-state RNA. This unproductive splicing compounds across multi-intronic genes, resulting in 15% of transcript molecules from protein-coding genes being unproductive. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are as often associated with NMD-induced expression level differences as with differences in protein isoform usage. Our findings suggest that much of the impact of AS is mediated by NMD-induced changes in gene expression rather than diversification of the proteome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Figures

Fig. 1
Fig. 1. Genomic data captured before, during, and after transcription reveal an abundance of NMD isoforms.
a, Subset of the population-scale datasets we analyzed, covering stages of mRNA biogenesis, from activation of enhancers/promoters (that is, H3K27ac and H3K4me3 ChIP-seq) to steady-state RNA (polyA RNA-seq). The gene expression correlation matrix (using promoter peak coverage for H3K27ac and H3K4me3, gene body coverage for H3K36me3 and exonic read coverage for RNA-seq) relative to steady-state RNA samples is shown as a heat map. b, Left: fraction of splice junction reads in each RNA-seq sample (columns, grouped by dataset) that are in Gencode-annotated productive transcript structures (blue) versus unannotated or annotated unproductive transcript structures (gray). The dashed lines indicate the median for each dataset. Right: for the 2.4% of splice junctions in naRNA-seq data that are not in annotated productive transcript structures, we checked for their unique presence in annotated unproductive transcript structures (for example, transcripts tagged by Gencode as ‘retained_intron’), or if unannotated, we attempted to translate sequences surrounding the splice sites (Supplementary Methods). The stacked bars indicate fraction of naRNA-seq splice junctions in each subcategory. c, Similar to b, comparing steady-state RNA from shRNA scramble control (n = 6) and shRNA dKD (n = 3) of SMG6 and SMG7 in HeLa cells. d, Fraction of cassette exons that are symmetric (that is, length divisible by three) as function of their usage, estimated as percent spliced in PSI. The error bars represent standard error of values for LCL lines treated as replicates (circular markers, same dataset as b) and standard error across replicate shRNA knockdown experiments (triangular markers, same dataset as c). e, Cumulative distribution of log fold differences in steady-state gene expression versus gene transcription (measured by H3K4me3 promoter activity, naRNA, H3K36me3 or 30 m 4sU-labeled RNA), a proxy for degradation rate. Genes are grouped by quintiles based on percent of unproductive junction reads. The quintile of genes with the most unproductive splicing (darkest) show the strongest signature of mRNA degradation. The correlation between unproductive splicing and 30 min labeled 4sU RNA/steady-state RNA is weaker than in comparisons using the other degradation rate proxies, consistent with rapid decay of unproductive transcripts.
Fig. 2
Fig. 2. Unproductive mis-splicing accumulates across transcripts.
a, Correlation between percent of unproductive junction reads in LeafCutter clusters versus length of the most used productive intron in the cluster (Spearman’s rho coefficient of 0.19; two-sided correlation test P value = 1.3 × 10−243). Correlation presented as cumulative distribution of percent unproductive splice junctions, for increasing groups of intron length. b, Percent unproductive splice junctions for each gene that are attributed to the most common (rank 1) unproductive junction, the second most common unproductive junction and so on. c, Model of unproductive splicing compounding across multi-intronic transcripts. A low unproductive rate at many independently spliced introns produces a high rate of unproductive molecules at the transcript level. d, Nanopore long-read sequencing quantifies the percent of full-length reads that are targeted by NMD, defined as containing at least one unproductive junction, as a function of the number of splice junctions in the read. The vertical dashed line marks eight splice junctions, corresponding to a typical full-length human transcript. The total RNA were isolated from shRNA-mediated dKD of NMD factors SMG6 and SMG7 or shRNA scramble control in HeLa cells. The naRNA data were from K562 cells. Multiple points of the same color indicate replicate experiments. The blue and orange shaded area represents the binomial expectation when assuming 1.5–2.5% and 0.2–0.7% of unproductive junction reads at each independent junction.
Fig. 3
Fig. 3. Risdiplam-induced splicing alterations mediate expression changes at hundreds of genes.
a, Overview of risdiplam-based approach to assess pervasiveness of NMD after splicing perturbations. LCLs were treated with eight doses of risdiplam. Splicing changes at cryptic exons assessed for NMD potential, and gene expression changes were estimated. b, Genome-wide mean splicing dose-dependent changes at various classes of 5′ss in naRNA and steady-state RNA. Bootstrapped 95% confidence intervals shaded around the mean activation level across n introns in each group. c, Left: dose-dependent splicing response at a risdiplam-induced exon in MYB. Right: dose-dependent expression response of MYB. d, Predicted translation result of 316 risdiplam-induced exons. Unproductive exons (expected to induce NMD) versus those that maintain transcript stability are red and blue, respectively. Annotated and unannotated exons are dark and light colors, respectively. e, Empirically measured effect of host gene expression (log2 fold-change) as measured in steady-state RNA and naRNA in the presence of risdiplam at 3.16 µM. Each point is a gene hosting a risdiplam-induced exon colored by the same exon classifications used in d. P values for two-sided Mann–Whitney U test, comparing productive (n = 85 genes hosting annotated or unannotated risdiplam-induced exons) and NMD sets (n = 219 genes) are shown. The box represents median and innerquartiles. The whiskers extend from hinge to most extreme value no greater than 1.5× IQR from hinge. f, Conventional small molecule drugs usually operate at the level of protein binding and are disproportionately skewed for particular classes of ‘druggable’ genes (for example, G-protein-coupled receptors, GPCRs) and against other classes (for example, transcription factors). Risdiplam-induced NMD targets are more representative of all genes. Disease genes with therapeutic potential by downregulation (Online Mendelian Inheritance in Man dominant negative genes) are similarly distributed across categories of previously ‘druggable’ genes. g, Cumulative distribution of gene length for genes with predicted NMD-induced exons, a similarly sized set of expression-matched genes, all genes or a set of gene targets for Food and Drug Administration (FDA)-approved small molecules.
Fig. 4
Fig. 4. Genetic variants that alter expression post-transcriptionally are enriched in splice-altering variants that are predicted to induce NMD.
a, Approach for identifying mechanisms of gene regulation, using data from CCHCR1 locus as illustrative example. Genetic variants that associate with chromatin peak height, RNA expression, splicing and so on (molQTL) are identified. The boxplots grouped by genotype at lead SNP depict innerquartiles of phenotype values with whiskers extending to most extreme value no greater than 1.5× IQR from hinge. Multitrait colocalization (Methods) compares alignment of molQTL signals (scatter plots) to identify molQTLs that probably share causal SNP. b, Effect size (β) of H3K27ac hQTLs at promoter versus eQTL effect sizes estimated in each RNA-seq dataset. Correlation summarized with Spearman’s rho coefficient and two-sided significance test. c, Tally of steady-state eGenes by their colocalizing molQTLs. A total of 831 eGenes colocalize (coloc) with an hQTL (*, H3K27ac, H3K4me1, H3K4me3 or H3K36me3), and 518 ‘post-transcriptional’ (post-txn) eGenes (**) do not colocalize with any hQTL but do colocalize with other molQTL, some of which may be sQTL or apaQTL. d, Example eQTL functioning through AS–NMD from poison exon (red). H3K27ac ChIP-seq and RNA-seq coverage grouped by genotype of lead eQTL SNP. The pink inset region (chr22:46,289,343-46,294,094) shows effect on splicing of poison exon, depicted as sashimi plots with relative usage (intronic PSI) of splice junction arcs. e, QQ plot shows inflation of eQTL signal among groups of SNPs (lead SNPs for p-sQTLs and u-sQTLs within the host gene, H3K27ac QTLs within 100 kb of test gene or random test SNPs). f, Similar to b. Top: effect size of p-sQTLs (only significantly affecting productive splice junctions) versus host gene eQTL β. Bottom: effect size of u-sQTLs (significantly affecting at least one unproductive splice junction) versus effect on host gene expression. g, eQTLs consistent with transcriptional regulation (eQTL/hQTLs, purple) in discovery dataset (LCLs) compared with eQTLs consistent with AS–NMD (eQTL/u-sQTLs, red). Left: each eQTL (SNP:gene pair, rows) assessed for eQTL effects in GTEx tissues (columns). Right: cumulative distribution of number tissues with significant effects. The P value for a two-sided Mann–Whitney test is shown.
Fig. 5
Fig. 5. Splicing-mediated NMD contributes to complex trait biology.
a, QQ plot of multiple sclerosis GWAS signal, grouped by categories of SNPs. p-sQTLs that impact the balance of protein-coding isoforms and u-sQTLs that impact usage of unproductive splice junctions are similarly inflated for GWAS signals. b, Fraction of GWAS loci that colocalize with various sets of molQTLs in each of 45 blood or immune-related traits. Number of loci for which colocalization was attempted is indicated at the top of each column. ‘Other combinations’ includes loci that colocalize with alternative polyadenylation QTLs, hQTLs and sQTLs or other combinations that may include sQTLs and other molQTLs and are difficult to interpret mechanistically. c, Histogram of usage of unique sQTL junctions that colocalize with a GWAS signal, grouped by sQTL type. Intronic PSI (junction read count divided by most abundant junction in LeafCutter cluster) for each junction was summarized as the median from steady-state RNA samples which are homozygous for the PSI-increasing allele. Many GWAS sQTLs, especially u-sQTLs, have low PSI, even in samples with genotypes that favor higher usage. d, Effect size (β) of sQTLs and eQTLs for distinct u-sQTLs that colocalize with a GWAS signal. Correlation was summarized with Spearman’s rho coefficient and two-sided significance test.
Extended Data Fig. 1
Extended Data Fig. 1. Overview of nascent RNA-seq.
(a) Nascent RNA-seq (naRNA-seq) captures nuclear-retained, non-polyadenylated, and rapidly decayed RNAs (snoRNAs, lncRNAs), that are absent from labeled and steady-state RNA-seq datasets. Each column represents an RNA-seq sample, grouped by the dataset type, each row a different gene. (b) naRNA-seq transcripts are only partially spliced. The splicing efficiency metric is based on the ratio of spliced and unspliced (intron:exon junction) reads, and varies between 0 and 1, with 1 indicating all reads are spliced. The cumulative distribution of splicing efficiency across all introns in expressed genes, for each RNA-seq sample from naRNA, recently transcribed RNA (30 min 4sU), and steady-state RNA. (c) Meta-intron coverage plot in LCL naRNA-seq sample NA18486 confirms the expected 5′ bias in intronic coverage genome-wide, consistent with the nascent nature of naRNA transcripts. Longer introns are naturally expected to have steeper slopes than short introns when intron lengths are rescaled for metaplot. (d) naRNA-seq transcripts are only partially spliced. Example sawtooth pattern in nascent RNA in the gene XYLT1. The nascent nature of transcripts in naRNA creates a 5′ bias in coverage, and in combination with co-transcriptional splicing, creates a sawtooth pattern of coverage. (e) Number of exon-exon splice junction reads in RNA-seq samples. The median in each dataset is marked with a labeled dashed line. Though naRNA-seq is only partially spliced, our deeper sequencing of naRNA-seq results in high coverage of splice junctions, allowing measurements of splicing before cytoplasmic decay.
Extended Data Fig. 2
Extended Data Fig. 2. Classification and quantification of splice junction classes across datasets.
(a) The percent of splice junctions in each sample that are uniquely attributable to transcripts tagged as ‘nonsense_mediated_decay’ (Gencode v37). Box and whiskers show quartiles for LCL samples (individual jittered points) in each RNA-seq data-type (n=86, 66, 66, and 462 for naRNA, 4sU 30min, 4sU 60min, and steady-state RNA-seq, respectively). Median for each data-type is labeled. (b) Splice junctions (arcs) overlapping the NUP42 gene illustrate approach (Supplemental Methods) for classifying splice junctions. Annotated splice donors and splice acceptors are marked with vertical dashed lines in dark and light gray, respectively. Annotated productive junctions are defined by their presence in at least one transcript with the value of ‘protein_coding’ in the Gencode transcript type tag. Unannotated productive junctions are not in any Gencode transcripts, and skip exons in the principal isoform such that the reading frame is maintained (that is, splice junction marked with 1). Annotated unproductive junctions are unique to Gencode transcripts not tagged with ‘protein coding’. Splice junction 2 is unique to NUP42-207, a ‘retained_intron’ tagged transcript. This splice junction uses a deep intronic 5′ss, creating a premature termination codon. Junctions 3 and 5 are unique to transcripts tagged as ‘nonsense_mediated_decay’, and junction 4 is unique to a transcript tagged with ‘processed_transcript’. All other junctions are classified as Unannotated unproductive. We attempted to translate the resulting transcripts that use these junctions, finding that they overwhelmingly introduce frameshift or in-frame stop codons (Supplemental Methods), such as the splice junction 6 which we predict to introduce a frameshift. (c) Similar to (b), where sample is represented as a column, grouped by dataset type, and the fraction of splice junction reads that are either productive (annotated or unannotated, classified as in (b), blue) or unproductive (annotated or unannotated, classified as in (b), red). The median in each dataset is marked with a dashed line and labeled.
Extended Data Fig. 3
Extended Data Fig. 3. Percentage of unproductively spliced reads upon knockdown of NMD factors.
(a) Fraction of splice junction reads in each short read steady-state RNA-seq sample that are in Gencode-annotated productive transcript structures (blue), versus unannotated or annotated unproductive transcript structures (gray). Biological replicates represented by each column, with dashed lines to indicate median for each group. NMD factors were knocked-down (KD) singly or as double knockdown (dKD) with shRNA in HeLa cells with an shRNA scramble control. (b) Nanopore long-read sequencing quantifies the percent of full-length reads that are targeted by NMD, defined as containing at least one unproductive junction, as a function of the number of splice junctions in the read. Knockdown experiments of similar design as in (a).
Extended Data Fig. 4
Extended Data Fig. 4. Enrichment of symmetric exons among alternatively-spliced exons (AS exons) is largely the result of NMD, rather than optimized splicing.
(a) Two opposing models to explain the observation that AS exons (defined as PSI ~ 50%) are enriched for symmetric exons (for example, length divisible by three) in steady-state RNA: (Model 1) AS exons are strongly enriched for symmetric exons, or (Model 2) AS exons are not more likely to be symmetric than random expectation or constitutive exons (~1/3 symmetric), but NMD efficiently eliminates frame-shifting AS exons such that they appear enriched for symmetric exons in steady-state RNA but not RNA that directly measures splicing outcomes without the influence of degradation, such as naRNA or RNA after knockdown of NMD factors. In both models, we expect that constitutive exons (PSI~100%) are not strongly enriched for symmetric exons since the reading frame can cross exon-boundaries without consequence if exons are truly constitutive. Although constitutive and AS exons cannot be experimentally distinguished by observations of DNA, we included the gene structures at the DNA level because the two models imply differing selection pressures on DNA sequence to maintain (Model 1), or not maintain (Model 2) accurate frame-preserving AS patterns. (b) Fraction of exons that are symmetric as a function of their usage, estimated as percent spliced in (PSI). Error bars represent standard error across LCL lines treated as replicates, and the standard error for replicate shRNA knockdown experiments (triangular markers, data from). Unlike in steady-state RNA, the enrichment for symmetric exons among AS exons is not apparent in naRNA or NMD KD.
Extended Data Fig. 5
Extended Data Fig. 5. Unproductive splicing is less abundant in highly transcribed and highly constrained genes.
(a) Correlation between gene expression and the maximum junction PSI of any unproductive junction in the gene, a proxy for percent of unproductive transcripts. The PSI of a junction is the number of reads mapping to the junction, divided by the maximum number of reads mapped to any junction in the same gene. The junction with the highest number of reads in a given gene has a junction PSI of 100%. SRSF genes (red) are well-known examples, of genes with high gene expression and high unproductive junction PSI. (b) Highly expressed genes have a lower unproductive splicing rate, as measured by the genewise percent of splice junction reads that are unproductive. Correlation summarized with spearman correlation coefficient and P value. Correlation visually presented as cumulative distribution of percent unproductive splice junctions, grouped by expression quintiles. (c) Similar to B, showing that genes with a higher Shet score (suggesting more selective constraint) have a lower unproductive splicing rate. Correlations in each panel summarized with spearman correlation coefficient and two-sided correlation test P value.
Extended Data Fig. 6
Extended Data Fig. 6. Assessing AS-NMD prevalence and diversity using full-length (FL) long reads.
(a) Pipeline to assess NMD status of transcripts (See Supplemental Methods). Aligned Oxford Nanopore cDNA reads of SMG6/SMG7 double knockdown (dKD) and shRNA control filtered for FL reads (anchored at annotated transcript termini). FL reads were translated from the first annotated start codon, classified with decision tree (steps 2-6 are previously established rules regarding NMD-targeting efficiency of transcripts) into seven transcript categories referenced in (b-e). Categories on right are qualitatively considered ‘unproductive’ in (e-h); categories on the left are ‘productive’. Number reads in categories shown. (b) Each splice junction observed in short read data was classified as productive or unproductive (and annotated or unannotated) (Supplemental Methods, Extended Data Fig. 2). FL reads were used to assess accuracy of junction-level classifications by considering most common transcript categories of FL reads containing that junction (requiring >2 FL reads). Fraction unique junctions in each short-read category matching most common context (transcript categories) plotted as bars. Limited number FL reads means only n junctions in each category (% of total in category) were assessed, noted on x-axis. (c) Relative degradation efficiency of each category estimated by comparing the median relative splice junction abundance in control vs dKD, and steady-state vs naRNA, short read data. Consistent with previous reports, categories differ in degradation efficiency. (d) Percent FL reads in each category, as function of number splice junctions in read. (e) Fraction reads belonging in each category across genes (columns). Only genes with >20 reads considered. The dKD sample has fewer reads, and therefore stronger ascertainment bias, with 143 highly expressed genes (TPM, colored rug) passing this filter. (f) Isoform structure and relative abundance (read count, and percent of each isoform among unproductive reads in control samples) of unproductive isoforms derived from the PRDX2, where 93.5% of unproductive transcripts derive from most-common PTC-inducing splice junction (blue). (g) Same as (f), for PSMB4, which has greater diversity of unproductive splice junctions. (h) Diversity of unproductive isoforms amongst all genes (columns) with at least 20 unproductive reads in control and dKD samples, respectively.
Extended Data Fig. 7
Extended Data Fig. 7. Risdiplam primarily induces down-regulating-post-transcriptional changes.
(a) MA-plots of differential expression upon treatment with low (100 nM) or high (3160 nM) dose of risdiplam in steady-state RNA or naRNA. Overlapping non-significant tests are reduced to gray hexbin, while significant tests (FDR<1%) represented as black dots. Number of significant up- or down-regulated genes is labeled to emphasize that while there are similar numbers of up- and down-regulated genes in naRNA, in steady-state RNA there is a relative over abundance of down-regulated genes. (b) Genes are classified by their effect size and significance (See Methods) of expression changes as measured in naRNA or steady-state RNA-seq after being treated with 3160nM risdiplam. Transcription (Txn)-based gene expression changes are defined as having similar and significant effects as measured in naRNA and polyA RNA. Genes regulated post-transcriptionally have stronger effects in steady-state RNA. There are more post-transcriptionally down-regulated genes than up-regulated genes, suggesting risdiplam-induced splice sites more often destabilize than stabilize host transcripts. Expectedly, there is significant overlap of the 219 NMD targets predicted from annotation of induced cassette exons among the post-txn down regulated genes (Odds Ratio=14.0, P<2x10−16, hypergeometric test for over-representation). (c) Left: Post-txn up-regulated genes may arise by splicing changes (such as risdiplam-induced exons, depicted as black cassette exon) that relieve NMD with a frame-correcting exon at a gene that is originally spliced into an unproductive isoform. The open reading frame is depicted as thick regions of exons. Right: More commonly, risdiplam-induced splicing changes result in post-txn down-regulated genes, consistent with splicing changes that break the reading frame of genes that are originally spliced into productive isoforms.
Extended Data Fig. 8
Extended Data Fig. 8. Transcription-mediated eQTLs (hQTL/eQTLs) are more tissue-specific than splicing-mediated eQTLs (sQTL/eQTLs).
(a) A set of transcription-mediated eQTLs (hQTL/eQTL colocalization) identified in our source dataset was compared to a set of splicing-mediated eQTLs (sQTL/eQTL colocalization without nominal hQTL signal), by estimating the SNP:gene effect across 38 GTEx tissues (columns) for each eGene (rows). Row-wise summary statistics were calculated and plotted as extra columns. (b) Row-wise summary statistics are plotted as cumulative distributions for visual contrast. From left to right, we see that (1) the absolute effect size in the Geuvadis LCL discovery dataset is slightly greater for hQTL/eQTLs than sQTL/eQTLs. Despite this, the sQTL/eQTLs have a (2) larger median effect size across GTEx tissues, and (3) have a smaller standard deviation of effect size across tissues. P values from two-sided Mann-Whitney test.
Extended Data Fig. 9
Extended Data Fig. 9. GWAS/sQTL colocalizations that also colocalize with eQTL have characteristics consistent with AS-NMD.
(a) sQTLs/GWAS colocalizations are more likely to come from u-sQTLs than p-sQTLs if the host gene eQTL also colocalizes in multi-trait colocalization analysis. P value from hypergeometric test for over-representation. (b) PSI distribution of introns as cumulative distribution plot for sQTLs that colocalize with eQTL and GWAS (sQTL+eQTL colocs) versus those that only colocalize with GWAS (sQTL colocs). We estimate PSI by averaging across samples with shared genotypes, either high/high genotypes or low/low genotypes (thus, avoiding confounding PSI estimates with different allele frequencies between datasets). sQTLs in unproductive introns that are sQTL+eQTLs have smaller PSI in steady-state RNA than naRNA, consistent with splicing-mediated decay at these GWAS loci transcripts. P-value for two-sided Mann-Whitney test. (c) Effect sizes of sQTLs that colocalize with GWAS, grouped by whether the GWAS signal also colocalizes with an eQTL (sQTL+eQTL colocs), whether it solely colocalizes with sQTL, or whether it also colocalizes with some other combination of traits (that is, sQTL + hQTL) in multi-trait colocalization analysis. Each junction is plotted once, even if it colocalizes with multiple GWAS loci across multiple traits. Correlation of effect sizes summarized as spearman rho correlation coefficient and two-sided correlation test P-value.
Extended Data Fig. 10
Extended Data Fig. 10. u-sQTL regulates NUDT14 expression, likely contributing to reticulocyte count.
NUDT14 eQTL and sQTL. (a) Gene structure of NUDT14-202, the protein-coding isoform marked as the principal isoform by Gencode. Thick exonic regions mark the open reading frame. Using that isoform as a reference, we predicted the u-sQTL splice junction (labeled arc) that colocalizes with reticulocyte-count GWAS signal to introduce a premature stop codon (red octagon), created a transcript with a long 3′ UTR, inducing NMD. (b) Pairwise scatter plots depict the association between the GWAS signal, NUDT14 eQTL signal, and chr14:105175987-105176534 junction sQTL signal. Each point is a SNP. All three traits colocalize in a single trait cluster in multi-trait colocalization (posterior probability of full colocalization, PPFC > 0.5, see Supplemental Methods). SNPs colored according to linkage disequilibrium relative to the top fine-mapped SNP (rs3825761). (c) NUDT14 sQTL boxplots showing unproductive splicing quartiles grouped by genotype show that the up-regulating effect of the C allele on the unproductive splice junction is present in steady-state RNA and naRNA, while the (d) down-regulating effect of the C allele on NUDT14 expression is present in steady-state RNA but not naRNA, consistent with co-transcriptional splicing and post-transcriptional regulation by NMD. Box represents median and inner-quartiles. Whiskers extend from hinge to most extreme value no greater than 1.5 IQR from hinge. Beta and P-value from linear model to test association between genotype and normalized phenotype.

Update of

References

    1. Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science338, 1587–1593 (2012). 10.1126/science.1230612 - DOI - PubMed
    1. Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science338, 1593–1599 (2012). 10.1126/science.1228186 - DOI - PMC - PubMed
    1. Mudge, J. M. et al. The origins, evolution, and functional potential of alternative splicing in vertebrates. Mol. Biol. Evol.28, 2949–2959 (2011). 10.1093/molbev/msr127 - DOI - PMC - PubMed
    1. Bénitère, F., Necsulea, A. & Duret, L. Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoans. Evol. Biol.10.1101/2022.12.09.519597 (2022). 10.1101/2022.12.09.519597 - DOI - PMC - PubMed
    1. Saudemont, B. et al. The fitness cost of mis-splicing is the main determinant of alternative splicing patterns. Genome Biol.18, 208 (2017). 10.1186/s13059-017-1344-6 - DOI - PMC - PubMed