Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 19:12:701076.
doi: 10.3389/fgene.2021.701076. eCollection 2021.

IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling

Affiliations

IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling

Jun-Ichi Takeda et al. Front Genet. .

Abstract

Prediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions -50 (Int-50) to -3 (Int-3) close to the 3' ss, we developed light gradient boosting machine (LightGBM)-based IntSplice2 models using pathogenic SNVs in the human gene mutation database (HGMD) and ClinVar and common SNVs in dbSNP with 0.01 ≤ minor allelic frequency (MAF) < 0.50. The LightGBM models were generated using features representing splicing cis-elements. The average recall/sensitivity and specificity of IntSplice2 by fivefold cross-validation (CV) of the training dataset were 0.764 and 0.884, respectively. The recall/sensitivity of IntSplice2 was lower than the average recall/sensitivity of 0.800 of IntSplice that we previously made with support vector machine (SVM) modeling for the same intronic positions. In contrast, the specificity of IntSplice2 was higher than the average specificity of 0.849 of IntSplice. For benchmarking (BM) of IntSplice2 with IntSplice, we made a test dataset that was not used to train IntSplice. After excluding the test dataset from the training dataset, we generated IntSplice2-BM and compared it with IntSplice using the test dataset. IntSplice2-BM was superior to IntSplice in all of the seven statistical measures of accuracy, precision, recall/sensitivity, specificity, F1 score, negative predictive value (NPV), and matthews correlation coefficient (MCC). We made the IntSplice2 web service at https://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice2.

Keywords: LightGBM; aberrant splicing; intronic mutations; single nucleotide variations; splice acceptor site.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Evaluation of IntSplice2 by fivefold CV. (A) Five iterated and mean ROC curves with AUROCs. (B) Five iterated and mean PR curves with AUPRs.
FIGURE 2
FIGURE 2
The top 10 important features of IntSplice2 in 110 features.
FIGURE 3
FIGURE 3
A representative screenshot of the output of IntSplice2 web service. As previously reported, g.73550880G > A on chromosome 10 (GRCh37/hg19) identified in a patient with Usher syndrome is at the ninth nucleotide from the 3’ end of intron 45 of CDH23. When a user chooses “GRCh37/hg19” and enters the chromosome number “10” and the genomic coordinate “73550880,” the IntSplice2 web service returns the result on the same window on a browser.

Similar articles

Cited by

References

    1. Abramowicz A., Gos M. (2018). Splicing mutations in human genetic disorders: examples, detection, and confirmation. J. Appl. Genet. 59 253–268. 10.1007/s13353-018-0444-7 - DOI - PMC - PubMed
    1. Akiba T., Sano S., Yanase T., Ohta T., Koyama M. (2019). Optuna: a next-generation hyperparameter optimization framework. arXiv [Preprint]. arXiv 1907.10902.
    1. Cartegni L., Wang J., Zhu Z., Zhang M. Q., Krainer A. R. (2003). ESEfinder: a web resource to identify exonic splicing enhancers. Nucleic Acids Res. 31 3568–3571. 10.1093/nar/gkg616 - DOI - PMC - PubMed
    1. Chang T. H., Huang H. Y., Hsu J. B., Weng S. L., Horng J. T., Huang H. D. (2013). An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs. BMC Bioinform. 14(Suppl. 2):S4. - PMC - PubMed
    1. Cheng J., Nguyen T. Y. D., Cygan K. J., Celik M. H., Fairbrother W. G., Avsec Z., et al. (2019). MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20:48. - PMC - PubMed

LinkOut - more resources