Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants

doi:10.1186/s40246-024-00586-9

. 2024 Feb 27;18(1):21.

doi: 10.1186/s40246-024-00586-9.

Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants

Hao Wu^#^{1

2}, Jin-Huan Lin^#^{1

2}, Xin-Ying Tang^#^{2

3}, Gaëlle Marenne⁴, Wen-Bin Zou^{1

2}, Sacha Schutz^{4

5}, Emmanuelle Masson^{4

5}, Emmanuelle Génin⁴, Yann Fichou⁴, Gerald Le Gac^{4

5}, Claude Férec⁴, Zhuan Liao^{6

7}, Jian-Min Chen⁸

Affiliations

¹ Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China.
² Shanghai Institute of Pancreatic Diseases, Shanghai, China.
³ Department of Prevention and Health Care, Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai, China.
⁴ Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France.
⁵ Service de Génétique Médicale et de Biologie de La Reproduction, CHRU Brest, Brest, France.
⁶ Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China. liaozhuan@smmu.edu.cn.
⁷ Shanghai Institute of Pancreatic Diseases, Shanghai, China. liaozhuan@smmu.edu.cn.
⁸ Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France. jian-min.chen@univ-brest.fr.

^# Contributed equally.

PMID: 38414044
PMCID: PMC10898081
DOI: 10.1186/s40246-024-00586-9

Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants

Hao Wu et al. Hum Genomics. 2024.

. 2024 Feb 27;18(1):21.

doi: 10.1186/s40246-024-00586-9.

Authors

Affiliations

¹ Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China.
² Shanghai Institute of Pancreatic Diseases, Shanghai, China.
³ Department of Prevention and Health Care, Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai, China.
⁴ Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France.
⁵ Service de Génétique Médicale et de Biologie de La Reproduction, CHRU Brest, Brest, France.
⁶ Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China. liaozhuan@smmu.edu.cn.
⁷ Shanghai Institute of Pancreatic Diseases, Shanghai, China. liaozhuan@smmu.edu.cn.
⁸ Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France. jian-min.chen@univ-brest.fr.

^# Contributed equally.

PMID: 38414044
PMCID: PMC10898081
DOI: 10.1186/s40246-024-00586-9

Abstract

Background: Single-nucleotide variants (SNVs) within gene coding sequences can significantly impact pre-mRNA splicing, bearing profound implications for pathogenic mechanisms and precision medicine. In this study, we aim to harness the well-established full-length gene splicing assay (FLGSA) in conjunction with SpliceAI to prospectively interpret the splicing effects of all potential coding SNVs within the four-exon SPINK1 gene, a gene associated with chronic pancreatitis.

Results: Our study began with a retrospective analysis of 27 SPINK1 coding SNVs previously assessed using FLGSA, proceeded with a prospective analysis of 35 new FLGSA-tested SPINK1 coding SNVs, followed by data extrapolation, and ended with further validation. In total, we analyzed 67 SPINK1 coding SNVs, which account for 9.3% of the 720 possible coding SNVs. Among these 67 FLGSA-analyzed SNVs, 12 were found to impact splicing. Through detailed comparison of FLGSA results and SpliceAI predictions, we inferred that the remaining 653 untested coding SNVs in the SPINK1 gene are unlikely to significantly affect splicing. Of the 12 splice-altering events, nine produced both normally spliced and aberrantly spliced transcripts, while the remaining three only generated aberrantly spliced transcripts. These splice-impacting SNVs were found solely in exons 1 and 2, notably at the first and/or last coding nucleotides of these exons. Among the 12 splice-altering events, 11 were missense variants (2.17% of 506 potential missense variants), and one was synonymous (0.61% of 164 potential synonymous variants). Notably, adjusting the SpliceAI cut-off to 0.30 instead of the conventional 0.20 would improve specificity without reducing sensitivity.

Conclusions: By integrating FLGSA with SpliceAI, we have determined that less than 2% (1.67%) of all possible coding SNVs in SPINK1 significantly influence splicing outcomes. Our findings emphasize the critical importance of conducting splicing analysis within the broader genomic sequence context of the study gene and highlight the inherent uncertainties associated with intermediate SpliceAI scores (0.20 to 0.80). This study contributes to the field by being the first to prospectively interpret all potential coding SNVs in a disease-associated gene with a high degree of accuracy, representing a meaningful attempt at shifting from retrospective to prospective variant analysis in the era of exome and genome sequencing.

Keywords: SPINK1 gene; Chronic pancreatitis; Full-length gene splicing assay (FLGSA); Pre-mRNA splicing; Precision medicine in genetics; Single-nucleotide variants (SNVs); Splice site; SpliceAI; Splicing prediction algorithms; Variant interpretation.

PubMed Disclaimer

Conflict of interest statement

Jian-Min Chen serves as an Associate Editor for Human Genomics but was not involved in the editorial review process or the decision to publish this article. All remaining authors declare that they have no competing interests.

Figures

**Fig. 1**
Overview of the FLGSA assay and research strategy. a Representation of the *SPINK1* full-length gene expression vector and the experimental steps involved in the FLGSA assay for each study variant. The coding sequences of the four-exon *SPINK1* gene are depicted to scale, while the intronic and untranslated region sequences are not. The reference *SPINK1* genomic sequence is NG_008356.2, and the reference *SPINK1* mRNA sequence is MANE (Matched Annotation from the NCBI and EMBL-EBI [45]) select ENST00000296695 or NM_001379610.1. NM_001379610.1 represents the *SPINK1* transcript isoform expressed in the exocrine pancreas [29, 30]. The starting and ending positions of the coding sequences in each exon, as well as those of the *SPINK1* genomic sequence cloned into the pcDNA3.1/V5-His-TOPO vector, are indicated in accordance with NM_001379610. b Illustration demonstrating how the FLGSA assay was integrated with SpliceAI to prospectively evaluate the splicing effects of all potential coding variants within the *SPINK1* gene. *Abbreviations:* FLGSA, full-length gene splicing assay; RT-PCR, reverse transcription-PCR; SNVs, single-nucleotide variants

**Fig. 2**
Graphical illustration of the SpliceAI Δ scores for three potential single-nucleotide variants at each coding position within *SPINK1* exon 1. The x-axis enumerates the coding positions to correlate Δ scores with specific nucleotide changes. Variants subjected to full-length gene splicing assay are highlighted at the figure's bottom, with arrows denoting their analysis status: black for previously analyzed variants, red for those currently assessed in the initial step of prospective analysis, and green for variants in the further validation phase. Variant labels are styled to indicate transcript outcomes: variants producing solely normally spliced transcripts are in standard font, while those resulting in both normally spliced and aberrantly spliced transcripts are highlighted in bold blue. *Abbreviations:* DS, Δ score; AG, acceptor gain; AL, acceptor loss; DG, donor gain; DL, donor loss

**Fig. 3**
Graphical illustration of the SpliceAI Δ scores for three potential single-nucleotide variants at each coding position within *SPINK1* exon 2. The x-axis enumerates the coding positions to correlate Δ scores with specific nucleotide changes. Variants subjected to full-length gene splicing assay are highlighted at the figure's bottom, with arrows denoting their analysis status: black for previously analyzed variants, red for those currently assessed in the initial step of prospective analysis, and green for variants in the subsequent validation phase. The typographic treatment of variant names reflects their transcript profiles: standard font denotes variants leading to only normally spliced transcripts, bold blue highlights variants associated with both normally spliced and aberrantly spliced transcripts, and bold red identifies variants exclusively resulting in aberrantly spliced transcripts. *Abbreviations:* DS, Δ score; AG, acceptor gain; AL, acceptor loss; DG, donor gain; DL, donor loss

**Fig. 4**
Graphical illustration of the SpliceAI Δ scores for three potential single-nucleotide variants at each coding position within *SPINK1* exon 3. The x-axis enumerates the coding positions to correlate Δ scores with specific nucleotide changes. Variants subjected to full-length gene splicing assay are highlighted at the figure's bottom, with arrows denoting their analysis status: black for previously analyzed variants, red for those currently assessed in the initial step of prospective analysis, and green for variants in the subsequent validation phase. All variants generated exclusively normally spliced transcripts. *Abbreviations:* DS, Δ score; AG, acceptor gain; AL, acceptor loss; DG, donor gain; DL, donor loss

**Fig. 5**
Graphical illustration of the SpliceAI Δ scores for three potential single-nucleotide variants at each coding position within *SPINK1* exon 4. The x-axis enumerates the coding positions to correlate Δ scores with specific nucleotide changes. Variants subjected to full-length gene splicing assay are highlighted at the figure's bottom, with arrows denoting their analysis status: black for previously analyzed variants and red for those currently assessed in the initial step of prospective analysis. All variants generated exclusively normally spliced transcripts. *Abbreviations:* DS, Δ score; AG, acceptor gain; AL, acceptor loss; DG, donor gain; DL, donor loss

**Fig. 6**
RT-PCR results from the FLGSA analysis of 35 potential *SPINK1* coding variants. Each band that underwent successful Sanger sequencing has been systematically annotated. These bands have been classified as either normally spliced transcripts (indicated by arrows) or aberrantly spliced transcripts, characterized by the retention of the first 140 bases of intron 1 or the skipping of exon 2. It is noteworthy that in cases where a variant produced two successfully sequenced bands, certain bands may contain both normally spliced and aberrantly spliced transcript isoforms. For further details and interpretation of these findings, refer to the main text. Full-length, unaltered gel images corresponding to this figure are made accessible in Additional file 2. Abbreviations: FLGSA, full-length gene splicing assay. RT-PCR, reverse transcription-PCR

**Fig. 7**
Sanger Sequencing results of 'Retention of the first 140 bases of intron 1' RT-PCR bands from three exon 1 terminal variants (c.56G > A/C/T) in *SPINK1*. Refer to Fig. 6 for the corresponding bands. Each band was found to contain a mixture of aberrantly spliced and normally spliced transcripts, with the 5’ and 3’ junctions of the aberrant transcript isoform, being delineated by vertical lines. The annotations beneath the electropherograms detail the junction-spanning sequences for both isoforms: the upper annotation for the aberrantly spliced transcript and the lower for the normally spliced transcript. In all subpanels, the normally spliced transcripts show a consistent sequence of exon 1 followed by exon 2, with the introduced exon 1 terminal variants highlighted in red. The aberrantly spliced transcripts have consistent 5’ junctions with exon 1 followed by intron 1 sequences (introduced variants in red), and their 3’ junctions are uniform, displaying retained intron 1 sequence (up to c.55 + 140) followed by exon 2. Sequence numbering is based on NM_001379610.1. Specifically, c.55 and c.56 denote the terminal position of exon 1 and the start of exon 2 in *SPINK1*, respectively; c.55 + 1 and c.55 + 140 refer to the first and the 140th nucleotides of intron 1 in *SPINK1*. *Abbreviations:* RT-PCR, reverse transcription-PCR

**Fig. 8**
Sanger sequencing of 'Exon 2-skipped' RT-PCR bands from two exon 2 variants (c.56G > C and c.65G > T) in *SPINK1*. For the corresponding bands, refer to Fig. 6. Sanger sequencing revealed that each band contains a mix of aberrantly spliced and normally spliced transcripts, with the junctions of both transcript isoforms being delineated by vertical lines. Annotations beneath the electropherograms specify the junction-spanning sequences for both isoforms: the upper annotation pertains to the aberrantly spliced transcript, and the lower to the normally spliced transcript. In both panels, the normally spliced transcripts exhibit a consistent sequence of exon 1 followed by exon 2, differentiated only by the introduced variants (highlighted in red). The aberrantly spliced transcripts are identical, characterized by exon 1 directly followed by exon 3. Sequence numbering aligns with NM_001379610.1. Specifically, c.55, c.56, and c.88 mark the terminal position of exon 1, the start of exon 2, and the start of exon 3 in *SPINK1*, respectively

**Fig. 9**
Interpretation of the three c.55 SNVs and the c.11C > G variant in exon 1 by reference to SpliceAI predictions and FLGSA results. a Illustration of the (partial) disruption of the physiological 5’ splice donor site of *SPINK1* intron 1 caused by the three potential SNVs at the last nucleotide of exon 1 (c.55). This disruption is shown in the context of the corresponding 9-bp 5’ splice signal sequence, which interacts with the 3’-GUCCAUUCA-5’ sequence at the 5’ end of U1snRNA. SpliceAI predicted this disruption (DL scores, 0.34 to 0.51) and the activation of an upstream cryptic splice donor site within exon 1 (DG scores, 0.33 and 0.37). However, our FLGSA assay revealed the activation of a downstream cryptic splice donor site. Vertical lines indicate paired bases between the 9-bp 5’ splice signal sequence and the 5’ end sequence of U1snRNA. The GT dinucleotides involved are highlighted in blue, with their positions (in accordance with NM_001379610.1) indicated. The 9-bp 5' splice site signal sequence position weight matrices (PWM) were sourced from Leman et al. [52], an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License. Note that the 9-bp 5’ splice signal sequences, whether in the context of the consensus sequence or *SPINK1* sequences, are presented in DNA. b Illustration of the c.11C > G variant in the context of the aforementioned upstream cryptic splice donor site. A dotted line represents the new base pairing derived from the variant, enhancing the interaction between the 9-bp 5’ splice signal sequence and the 5’ end sequence of U1snRNA

**Fig. 10**
RT-PCR results for the validation analysis of five potential *SPINK1* coding variants through FLGSA. Arrows indicate wild-type or normally spliced transcripts, all of which were confirmed by Sanger sequencing. Full-length, unaltered gel image corresponding to this figure is made accessible in Additional file 3. *Abbreviations:* FLGSA, full-length gene splicing assay. RT-PCR, reverse transcription-PCR

**Fig. 11**
Overall correlation between FLGSA findings and SpliceAI predictions across all exons of the *SPINK1* gene. On the y-axis, "Highest DS" represents the highest Δ score among the four SpliceAI predictions for each of the 67 *SPINK1* variants analyzed through the full-length gene splicing assay (refer to Table 1 for details). The x-axis categorizes the variants based on their transcript outcomes: "No" for variants exclusively producing normally spliced transcripts, "Partial" for variants leading to a mix of normally and aberrantly spliced transcripts, and "Complete" for variants solely resulting in aberrantly spliced transcripts. Δ scores of 0.90 and 0.30 are demarcated with thicker dotted lines to denote thresholds for exclusive generation of aberrantly spliced or normally spliced transcripts, respectively

See this image and copyright information in PMC

Cited by

Alu insertion-mediated dsRNA structure formation with pre-existing Alu elements as a disease-causing mechanism.
Masson E, Maestri S, Bordeau V, Cooper DN, Férec C, Chen JM. Masson E, et al. Am J Hum Genet. 2024 Oct 3;111(10):2176-2189. doi: 10.1016/j.ajhg.2024.08.016. Epub 2024 Sep 11. Am J Hum Genet. 2024. PMID: 39265574 Free PMC article.
Genetics and clinical implications of SPINK1 in the pancreatitis continuum and pancreatic cancer.
Wang QW, Zou WB, Masson E, Férec C, Liao Z, Chen JM. Wang QW, et al. Hum Genomics. 2025 Mar 26;19(1):32. doi: 10.1186/s40246-025-00740-x. Hum Genomics. 2025. PMID: 40140953 Free PMC article. Review.
U-rich elements drive pervasive cryptic splicing in 3' UTR massively parallel reporter assays.
Dao K, Jungers CF, Djuranovic S, Mustoe AM. Dao K, et al. Nat Commun. 2025 Jul 25;16(1):6844. doi: 10.1038/s41467-025-62000-9. Nat Commun. 2025. PMID: 40715118 Free PMC article.
U-rich elements drive pervasive cryptic splicing in 3' UTR massively parallel reporter assays.
Dao K, Jungers CF, Djuranovic S, Mustoe AM. Dao K, et al. bioRxiv [Preprint]. 2024 Aug 5:2024.08.05.606557. doi: 10.1101/2024.08.05.606557. bioRxiv. 2024. Update in: Nat Commun. 2025 Jul 25;16(1):6844. doi: 10.1038/s41467-025-62000-9. PMID: 39149310 Free PMC article. Updated. Preprint.

References

1. Lim KH, Ferraris L, Filloux ME, Raphael BJ, Fairbrother WG. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc Natl Acad Sci U S A. 2011;108(27):11093–11098. - PMC - PubMed
1. Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol. 2017;18(2):102–114. - PMC - PubMed
1. Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3(4):285–298. - PubMed
1. Sarkar A, Panati K, Narala VR. Code inside the codon: The role of synonymous mutations in regulating splicing machinery and its impact on disease. Mutat Res Rev Mutat Res. 2022;790:108444. - PubMed
1. Aicher JK, Jewell P, Vaquero-Garcia J, Barash Y, Bhoj EJ. Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq. Genet Med. 2020;22(7):1181–1190. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

20YF1459400/Shanghai Sailing Program

LinkOut - more resources

Full Text Sources

[1] Lim KH, Ferraris L, Filloux ME, Raphael BJ, Fairbrother WG. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc Natl Acad Sci U S A. 2011;108(27):11093–11098. - PMC - PubMed

[2] Lim KH, Ferraris L, Filloux ME, Raphael BJ, Fairbrother WG. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc Natl Acad Sci U S A. 2011;108(27):11093–11098. - PMC - PubMed

[3] Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol. 2017;18(2):102–114. - PMC - PubMed

[4] Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol. 2017;18(2):102–114. - PMC - PubMed

[5] Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3(4):285–298. - PubMed

[6] Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3(4):285–298. - PubMed

[7] Sarkar A, Panati K, Narala VR. Code inside the codon: The role of synonymous mutations in regulating splicing machinery and its impact on disease. Mutat Res Rev Mutat Res. 2022;790:108444. - PubMed

[8] Sarkar A, Panati K, Narala VR. Code inside the codon: The role of synonymous mutations in regulating splicing machinery and its impact on disease. Mutat Res Rev Mutat Res. 2022;790:108444. - PubMed

[9] Aicher JK, Jewell P, Vaquero-Garcia J, Barash Y, Bhoj EJ. Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq. Genet Med. 2020;22(7):1181–1190. - PMC - PubMed

[10] Aicher JK, Jewell P, Vaquero-Garcia J, Barash Y, Bhoj EJ. Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq. Genet Med. 2020;22(7):1181–1190. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants

Affiliations

Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources