Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;56(9):1914-1924.
doi: 10.1038/s41588-024-01878-5. Epub 2024 Aug 22.

Genome-scale quantification and prediction of pathogenic stop codon readthrough by small molecules

Affiliations

Genome-scale quantification and prediction of pathogenic stop codon readthrough by small molecules

Ignasi Toledano et al. Nat Genet. 2024 Sep.

Abstract

Premature termination codons (PTCs) cause ~10-20% of inherited diseases and are a major mechanism of tumor suppressor gene inactivation in cancer. A general strategy to alleviate the effects of PTCs would be to promote translational readthrough. Nonsense suppression by small molecules has proven effective in diverse disease models, but translation into the clinic is hampered by ineffective readthrough of many PTCs. Here we directly tackle the challenge of defining drug efficacy by quantifying the readthrough of ~5,800 human pathogenic stop codons by eight drugs. We find that different drugs promote the readthrough of complementary subsets of PTCs defined by local sequence context. This allows us to build interpretable models that accurately predict drug-induced readthrough genome-wide, and we validate these models by quantifying endogenous stop codon readthrough. Accurate readthrough quantification and prediction will empower clinical trial design and the development of personalized nonsense suppression therapies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Quantifying readthrough of thousands of pathogenic PTCs.
a, Readthrough drugs stimulate full-length protein synthesis and decrease NMD-mediated transcript degradation. b, Experimental design, ~5,800 nonsense variants in human genetic diseases and cancer were retrieved from ClinVar, TCGA and MSK-IMPACT datasets, cloned in a readthrough reporter, integrated into the genome of HEK293T_LP human cell line and treated with eight readthrough compounds. A readthrough efficiency value was obtained for each variant–drug pair. c, Sort-sequencing overview. Each cell integrates one copy of one variant, cells are sorted based on mCherry fluorescence (x-axis), bins are sequenced and readthrough percentages are calculated from the mCherry distribution of reads of each variant normalized to the distribution of a no-nonsense variant. d, Deep mutational scanning (DMS) versus individual measurements Pearson’s correlation (r = 0.95), where 15 variants spanning the whole readthrough range under SRI treatment were individually measured (Spearman correlation (ρ) = 0.86). e, The same 15 variants shown in d were episomally transfected in MCF7 and HeLa cells, and their readthrough percentages were correlated with HEK293T_LP’s. Pearson’s correlation and P values are shown. f, DMS Pearson’s correlation (and corresponding P values) with measurements from previous studies,,– (Spearman’s correlation (ρ) = 0.56, 0.93, 0.71, 0.59, 1, 0.94, from top-left to bottom-right plots). Titles indicate the gene for which nonsense variants were tested and the drug used to stimulate readthrough. The bottom-right plot does not show DMS estimates, but measurements of individual variants also tested in refs. ,, which were used to validate the readthrough reporter. Note that the readthrough scales differ across some of the studies, illustrating how differences in the assay, conditions and reporter influence the absolute readthrough. Source data
Fig. 2
Fig. 2. Sequence features explain the readthrough variability across PTCs and drugs.
a, Readthrough distributions across drugs. The number of high-confidence variants (≥10 reads) recovered for each treatment and for which readthrough percentages were quantified is shown. b, Interdrug correlation. Correlation values between the same drug represent the inter-replicate correlation. Examples of high-correlated (SRI and SJ6986) and low-correlated (SRI and clitocine) drug pairs are shown, colored by stop type. c, Readthrough efficiencies for all variant–drug combinations. dj, Effect of the sequence feature (x axis) on readthrough efficiency (y axis) in HEK293T_LP cells, colored by the drug. The top and bottom sides of the box are the upper and lower quartiles, respectively. The box covers the interquartile interval, where 50% of the data are found. The horizontal line that splits the box in two is the median. Only variants where the stop codon is UGA are shown (except for d and g, where all stop codon variants are shown). The sequence features are stop codon identity (n = 22,342, P < 2 × 10−16, Kruskal–Wallis test; d), the nucleotide in position +1 downstream of the PTC (n = 10,602, P < 2 × 10−16; e), the nucleotides in +1, +2 and +3 positions downstream of the PTC (n = 2,589, P < 2 × 10−16; f), same as e but stratified by stop codon (in clitocine samples U>G for UAA stops, n = 614, adjusted P < 2 × 10−16; U=G for UGA stops, n = 1,395, adjusted P = 0.3; one-sided Wilcoxon signed-rank test; g), the nucleotides in −1, −2 and −3 positions upstream of the PTC together with the amino acid encoded by each codon (n = 2,589, P < 2 × 1016, Kruskal–Wallis test; h) and same as h but only for variants with a glutamic acid upstream of the PTC (GAA>GAG for DAP, n = 155, adjusted P = 7 × 10−11; GAA=GAG for clitocine, n = 158, adjusted P = 0.6; one-sided Wilcoxon signed-rank test; i). Finally, the effect of amino acids encoded by A-ending codons on readthrough efficiency across drugs, where codons ended in A display higher readthrough compared to the rest of the codons (n = 7,989, adjusted P < 6 × 10−5 for DAP, G418 and SRI, one-sided Wilcoxon signed-rank test). The nucleotide upstream of the PTC is colored (j). Source data
Fig. 3
Fig. 3. Readthrough-sensitive nonsense variants differ across drugs.
a, Readthrough efficiency for 12 multistop variants across four drugs. Each multistop variant comprises two different nonsense mutations (different stop codon identities) observed in the same genomic locus. b, Percentage of variants with readthrough over different thresholds for each drug separately and when considering all eight drugs together (All_drugs). c, All pairwise overlaps of each drug’s top 50 readthrough-sensitive variants. The number indicates how many variants overlap in the top 50 readthrough-sensitive variant sets of the two compared drugs. d, Readthrough efficiency across drugs, for 102 nonsense TP53 mutations colored by stop codon type. The top five most recurrent nonsense mutations in the human tumor genomes are highlighted. e, Our observed readthrough efficiencies of the nonsense variants tested in two clinical trials (CTs) (blue), together with the rest of the nonsense variants in the same gene tested in our assay (purple). Clinical trial identifier, drug and gene tested are specified in the titles. The top and bottom sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data are found. The horizontal line that splits the box in two is the median. f, Number of variants for which each drug displays the highest readthrough efficiency across the top three genes commonly tested in clinical trials for nonsense suppression therapies, considering all variants in our dataset. Source data
Fig. 4
Fig. 4. Interpretable models predict readthrough efficiency from sequence context.
a, Drug-specific models cross-validated predictive performance for CC90009, clitocine, DAP, G418, SJ6986 and SRI conditions. b, Contribution to model performance of the eight nucleotides downstream of the PTC (by adding one at a time). The fixed predictive variables present in all models are the stop codon type and the three nucleotides upstream of the PTC. T test over 20 cross-validation rounds comparing each model (column) to the previous one was used to determine significance (adjusted *P < 0.05, adjusted **P < 0.01, one-sided t test). c, Pan-drug models cross-validated predictive performance—drug-agnostic (top), drug-aware but sequence × drug interaction-agnostic (middle), drug and sequence × drug interaction-aware (bottom). d, Contribution of each sequence feature to the drug-specific models. Y axis shows the percentage drop in r2 when each term is removed from the model and normalized to the full model (1 − (r2 on term removal/r2 full model)). e, Correlation of drug-specific model coefficients (note that for the sake of coefficient interpretability, we ran the models without the interaction term stop_type × down_123 nts, which incurs only a small decrease of r2, ranging between 1% and 3% depending on the drug). Coefficients are colored by the model feature they belong to—stop codon type, down_123 nt and up_123 nt. Drugs displaying high correlations respond similarly to the sequence features and, consequently, trigger readthrough of similar subsets of PTCs. Source data
Fig. 5
Fig. 5. In silico nonsense saturation mutagenesis of the human genome.
a, Generation of the comprehensive in silico dataset with all possible nonsense mutations in human coding genes. b, Readthrough predictions along the coding sequence (CDS) of TP53 for each stop codon type. Each panel represents a drug-specific readthrough prediction—DAP (top), clitocine (middle) and SRI (bottom). c, Percentage of variants genome-wide with readthrough over a given threshold (color legend) for each drug separately and when considering all eight drugs together (All_drugs). d, Percentage of the number of variants across all possible variants in the human exome for which each drug is predicted to display the highest readthrough efficiency. e, Cumulative histograms showing the number of variants as a function of readthrough efficiency for the genes DMD (top), PTEN (middle) and TP53 (bottom), stratified by stop codon type as UAA (left), UAG (center) and UGA (right). Source data
Fig. 6
Fig. 6. Quantifying readthrough for >17,000 natural termination codons (NTCs).
a, Readthrough distributions across drugs for the PTC and NTC libraries (two-sided Wilcoxon test, ***P < 2 × 10−16 and n = 23,459, n = 23,096, n = 22,905, n = 22,989 for clitocine, DAP, G418 and SRI, respectively, whereas P = NS and n = 23,201 for SJ6986). The top and bottom sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data are found. The horizontal line that splits the box in two is the median. b, Readthrough distributions across drugs for the PTC and NTC libraries. The threshold indicates the number of amino acids downstream of the NTC considered for the analysis. NTC variants with a 3′-UTR in-frame stop codon more proximal than the threshold are assumed to have a readthrough of 0%. Increasing the threshold increases the number of readthrough-insensitive variants. The number of NTC high-confidence variants (≥10 reads) recovered for each treatment and for which readthrough percentages were quantified are 17,812, 17,654, 17,382, 17,661 and 17,587 for clitocine, DAP, G418, SJ6986 and SRI, respectively. c, Drug-specific models predictive performance on the NTC dataset using NTC-trained tenfold cross-validated models (top) or PTC-trained models (bottom). d, Correlation of the mean readthrough for each sequence context between PTCs and NTCs, colored by the sequence feature. NS, not significant. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Experimental setup and overview of the drug’s datasets.
a, FACS profiles (BD Influx Cell Sorter instrument) of the PTC library under the different treatments sorted by EGFP (y-axis) and mCherry (x-axis). Binned populations are indicated together with the control population harboring the no-nonsense TP53 variant. b Inter-replicate correlations for the nine conditions. c,d, Cell viability (c) and readthrough (d) titration curves for each drug, where error bars represent the standard deviation across three biological replicates. Four to six drug concentrations were tested for each drug, and the concentration displaying the highest readthrough and reducing cell viability less than 25% (blue) was used for the assay. Very toxic concentrations were not tested for readthrough stimulation. In d, readthrough was calculated as the (mCherry+ and EGFP+)/(EGFP+) cell ratio multiplied by the mean mCherry intensity of the mCherry+ population and normalized to the readthrough of the no-nonsense variant. e, All pairwise inter-drug correlations. f, Sequence features association with readthrough efficiency: showing Pearson correlations (continuous variables) and Kruskal–Wallis chi-squared statistics (discrete variables). g, Percentage of reads mapping to the no-nonsense variant across sorting populations of the natural stops library under gentamicin treatment. The variant is almost exclusively found in the no-nonsense population, where it represents 30% of the cells. h, DMS vs individual measurements Pearson’s correlation (r = 0.77). It represents an extension of Fig. 1d to specifically test the upper ceiling of the assay, with ten more variants spanning high-readthrough estimates included. A loess curve was fit to model the non-linearities triggered by the upper limit of the assay (see Supplementary Note 1 for more information on the assay saturation limit. Variants above the dashed line (1.3% of the library) have >90% reads in the highest sorting gate. i, Readthrough distributions across drugs colored by the stop type. Source data
Extended Data Fig. 2
Extended Data Fig. 2. Sequence features explain the readthrough variability across PTCs and drugs.
ai, Effect of the sequence feature (x-axis) on readthrough efficiency (y-axis) colored by drug. The top and bottom sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data are found. The horizontal line that splits the box in two is the median. Only variants where the stop codon is UGA are shown (except for c and e, where all stop codon variants are shown). The sequence features are the three nucleotides downstream of the PTC (n = 10645, p < 2e−16, Kruskal–Wallis test) (a), the three nucleotides upstream of the PTC (n = 10645, p < 2e−16) (b), the stop type (n = 22227, p < 2e−16) (c), the nucleotide in position +1 downstream of the PTC (n = 10502, p < 2e−16) (d), same as d but stratified by stop codon (n = 16753) (e), the amino acid upstream of the PTC (n = 10602, p < 2e−16) (f), variants with a glutamic acid upstream of the PTC stratified by the codon (n = 613) (g), variants with an arginine upstream of the PTC stratified by the codon (n = 1040, p = 3e−6) (h), and the effect of amino acids encoded by A-ending codons on readthrough efficiency for FUr, gentamicin, CC90009 and SJ6986, where codons ended in A display higher readthrough efficiencies compared to the rest of the codons (n = 7902, adjusted p < 1e−3, one-sided Wilcoxon signed-rank test). The nucleotide in position +3 of the codon is denoted with colors (i). j, Mean readthrough difference between pairs of codons with Hamming distance of 1 (that is, single nucleotide difference) that encode for the same amino acid, or pairs that encode a different amino acid across drugs. P value of the two-sided t-test between the same amino acid and different amino acid groups is shown. k,l, Readthrough distributions (k) and pairwise correlations (l) for the three SJ6986 concentrations tested (0.5 μM, 5 μM and 20 μM). Source data
Extended Data Fig. 3
Extended Data Fig. 3. Codon-related features, multistop variants and overview of PTEN nonsense mutations and clinical trials.
ac, Correlation of tAI (a), CAI (b) and GC% (c) of the 5aas upstream of the PTC with readthrough efficiency for each drug. d, Correlation of multistop variants across drugs. Each data point belongs to a mutation in the same genomic position but with a different stop type. e, Drug preferences for the highly represented genes in our dataset (>20 variants, n = 33). Y-axis shows the percentage of mutations for which each drug displays the highest readthrough. f, Readthrough efficiency across drugs for 97 nonsense PTEN mutations colored by stop codon type. The top 4 most recurrent nonsense mutations in human tumors are highlighted. g, Percentage of IDUA and ATM mutations in our dataset displaying higher readthrough levels than the phenotypic threshold reported in refs. ,, across drugs. h, All past and current (n = 42) phases II–IV clinical trials testing readthrough drugs, obtained from ref. . i, Our readthrough efficiencies of the nonsense variants tested in two clinical trials (CTs) (blue), together with the rest of nonsense variants in the same gene tested in our assay (purple). Clinical trial identifier, drug (ataluren) and gene tested are specified in the titles. Many variants included in clinical trials are unresponsive to drugs, likely hindering their performance. The top and bottom sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data are found. The horizontal line that splits the box in two is the median. Source data
Extended Data Fig. 4
Extended Data Fig. 4. Predictive models overview and optimization.
a, Drug-specific models cross-validated predictive performance for FUr, gentamicin and untreated conditions. b, Downsampling the number of readthrough variants yields decreased model performance for the high-performing drug models (Supplementary Note 6). X-axis shows the number of variants with readthrough >1% retained and used to rerun the models (gray). We used control models randomly removing the same number of variants to control for the effect of smaller training sizes in model performance (black). Models with 20, 50 and 150 variants retained intend to represent similar scenarios to untreated, gentamicin and FUr datasets. The r2s shown are the average over 10 cross-validation rounds. c, Comparison of the r2 values for the drug-specific models when using stop type, the three nucleotides downstream and upstream of the PTC and the interaction between stop type and three nucleotides downstream versus when using ElasticNet regularization on 47 sequence features (Extended Data Fig. 1f and Supplementary Table 4). d, Performance for three different model formulations across drugs (Supplementary Note 6): using only stop type and the three nucleotides downstream and upstream of the PTC, adding the stop type and three nucleotides upstream interaction or adding the stop type and three nucleotides downstream interaction. Only the latter consistently improves model performance across drugs. The r2s shown are the average over 10 cross-validation rounds. e, Performance for three different model formulations across drugs: encoding the three nucleotides downstream and upstream of the PTC as nucleotide triplets (m1), encoding the upstream sequence as a nucleotide triplet and the three nucleotides downstream as three different terms (one for each position, no interaction among them, m2) and encoding the downstream sequence as a nucleotide triplet and the three nucleotides upstream as three different terms (one for each position, no interaction among them, m3). m1 consistently yields higher r2 across drugs. The r2s shown are the average over 10 cross-validation rounds. f, Contribution of each sequence feature to the pan-drug model. The y-axis shows the relative drop in r2 when each term is removed from the model and normalized to the full model (1 − (r2 upon term removal/r2 full model)). gj, Model coefficients of the following predictive models: CC90009 and clitocine (g), DAP and SJ6986 (h), G418 and SRI (i) and the down_123 nt (top) and up_123 nt (bottom) coefficients for the pan-drug model (j). Mean, 95% confidence intervals and significance (two-sided Student’s t-test) of the coefficient estimates across 10 cross-validation rounds are shown. Asterisks represent an adjusted p-value < 0.01. Source data
Extended Data Fig. 5
Extended Data Fig. 5. In silico PTC saturation mutagenesis.
a, Overview of the readthrough predictions along the coding sequence of TP53 for each stop codon type. Each panel represents a drug-specific prediction: G418 (top), CC90009 (middle) and SJ6986 (bottom). b, Same as a, but drugs are represented as colors and each panel belongs to a different stop type. Source data

References

    1. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res.46, D1062–D1067 (2018). 10.1093/nar/gkx1153 - DOI - PMC - PubMed
    1. Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet.139, 1197–1207 (2020). 10.1007/s00439-020-02199-3 - DOI - PMC - PubMed
    1. Supek, F., Lehner, B. & Lindeboom, R. G. H. To NMD or not to NMD: nonsense-mediated mRNA decay in cancer and other genetic diseases. Trends Genet.37, 657–668 (2021). 10.1016/j.tig.2020.11.002 - DOI - PubMed
    1. Lykke-Andersen, S. & Jensen, T. H. Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat. Rev. Mol. Cell Biol.16, 665–677 (2015). 10.1038/nrm4063 - DOI - PubMed
    1. Lombardi, S., Testa, M. F., Pinotti, M. & Branchini, A. Molecular insights into determinants of translational readthrough and implications for nonsense suppression approaches. Int. J. Mol. Sci.21, 9449 (2020). 10.3390/ijms21249449 - DOI - PMC - PubMed