IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
- PMID: 34349788
- PMCID: PMC8326971
- DOI: 10.3389/fgene.2021.701076
IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
Abstract
Prediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions -50 (Int-50) to -3 (Int-3) close to the 3' ss, we developed light gradient boosting machine (LightGBM)-based IntSplice2 models using pathogenic SNVs in the human gene mutation database (HGMD) and ClinVar and common SNVs in dbSNP with 0.01 ≤ minor allelic frequency (MAF) < 0.50. The LightGBM models were generated using features representing splicing cis-elements. The average recall/sensitivity and specificity of IntSplice2 by fivefold cross-validation (CV) of the training dataset were 0.764 and 0.884, respectively. The recall/sensitivity of IntSplice2 was lower than the average recall/sensitivity of 0.800 of IntSplice that we previously made with support vector machine (SVM) modeling for the same intronic positions. In contrast, the specificity of IntSplice2 was higher than the average specificity of 0.849 of IntSplice. For benchmarking (BM) of IntSplice2 with IntSplice, we made a test dataset that was not used to train IntSplice. After excluding the test dataset from the training dataset, we generated IntSplice2-BM and compared it with IntSplice using the test dataset. IntSplice2-BM was superior to IntSplice in all of the seven statistical measures of accuracy, precision, recall/sensitivity, specificity, F1 score, negative predictive value (NPV), and matthews correlation coefficient (MCC). We made the IntSplice2 web service at https://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice2.
Keywords: LightGBM; aberrant splicing; intronic mutations; single nucleotide variations; splice acceptor site.
Copyright © 2021 Takeda, Fukami, Tamura, Shibata and Ohno.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures



Similar articles
-
FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon.Genes (Basel). 2023 Sep 6;14(9):1765. doi: 10.3390/genes14091765. Genes (Basel). 2023. PMID: 37761905 Free PMC article.
-
IntSplice: prediction of the splicing consequences of intronic single-nucleotide variations in the human genome.J Hum Genet. 2016 Jul;61(7):633-40. doi: 10.1038/jhg.2016.23. Epub 2016 Mar 24. J Hum Genet. 2016. PMID: 27009626
-
Rules and tools to predict the splicing effects of exonic and intronic mutations.Wiley Interdiscip Rev RNA. 2018 Jan;9(1). doi: 10.1002/wrna.1451. Epub 2017 Sep 26. Wiley Interdiscip Rev RNA. 2018. PMID: 28949076 Review.
-
InMeRF: prediction of pathogenicity of missense variants by individual modeling for each amino acid substitution.NAR Genom Bioinform. 2020 May 26;2(2):lqaa038. doi: 10.1093/nargab/lqaa038. eCollection 2020 Jun. NAR Genom Bioinform. 2020. PMID: 33543123 Free PMC article.
-
Understanding human DNA variants affecting pre-mRNA splicing in the NGS era.Adv Genet. 2019;103:39-90. doi: 10.1016/bs.adgen.2018.09.002. Epub 2019 Jan 17. Adv Genet. 2019. PMID: 30904096 Review.
Cited by
-
New genetic diagnoses for inherited retinal dystrophies by integrating splicing tools into NGS pipelines.NPJ Genom Med. 2025 Jul 2;10(1):52. doi: 10.1038/s41525-025-00500-9. NPJ Genom Med. 2025. PMID: 40603303 Free PMC article.
-
Computational prediction of human deep intronic variation.Gigascience. 2022 Dec 28;12:giad085. doi: 10.1093/gigascience/giad085. Epub 2023 Oct 25. Gigascience. 2022. PMID: 37878682 Free PMC article. Review.
-
Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z. Hum Genomics. 2024. PMID: 39198917 Free PMC article.
-
Bioinformatics characterization of variants of uncertain significance in pediatric sensorineural hearing loss.Front Pediatr. 2024 Feb 21;12:1299341. doi: 10.3389/fped.2024.1299341. eCollection 2024. Front Pediatr. 2024. PMID: 38450295 Free PMC article.
-
FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon.Genes (Basel). 2023 Sep 6;14(9):1765. doi: 10.3390/genes14091765. Genes (Basel). 2023. PMID: 37761905 Free PMC article.
References
-
- Akiba T., Sano S., Yanase T., Ohta T., Koyama M. (2019). Optuna: a next-generation hyperparameter optimization framework. arXiv [Preprint]. arXiv 1907.10902.
LinkOut - more resources
Full Text Sources