Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 21;14(3):e0205050.
doi: 10.1371/journal.pone.0205050. eCollection 2019.

Exon prediction based on multiscale products of a genomic-inspired multiscale bilateral filtering

Affiliations

Exon prediction based on multiscale products of a genomic-inspired multiscale bilateral filtering

Xiaolei Zhang et al. PLoS One. .

Abstract

Multiscale signal processing techniques such as wavelet filtering have proved to be particularly successful in predicting exon sequences. Traditional wavelet predictor is domain filtering, and enforces exon features by weighting nucleotide values with coefficients. Such a measure performs linear filtering and is not suitable for preserving the short coding exons and the exon-intron boundaries. This paper describes a prediction framework that is capable of non-linearly processing DNA sequences while achieving high prediction rates. There are two key contributions. The first is the introduction of a genomic-inspired multiscale bilateral filtering (MSBF) which exploits both weighting coefficients in the spatial domain and nucleotide similarity in the range. Similarly to wavelet transform, the MSBF is also defined as a weighted sum of nucleotides. The difference is that the MSBF takes into account the variation of nucleotides at a specific codon position. The second contribution is the exploitation of inter-scale correlation in MSBF domain to find the inter-scale dependency on the differences between the exon signal and the background noise. This favourite property is used to sharp the important structures while weakening noise. Three benchmark data sets have been used in the evaluation of considered methods. By comparison with four existing techniques, the prediction results demonstrate that: the proposed method reveals at least improvement of 4.1%, 50.5%, 25.6%, 2.5%, 10.8%, 15.5%, 11.1%, 12.3%, 9.2% and 2.4% on the exons length of 1-24, 25-49, 50-74, 75-99, 100-124, 125-149, 150-174, 175-199, 200-299 and 300-300+, respectively. The MSBF of its nonlinear nature is good at energy compaction, which makes it capable of locating the sharp variations around short exons. The direct scale multiplication of coefficients at several adjacent scales obviously enhanced exon features while the noise contents were suppressed. We show that the non-linear nature and correlation-based property achieved in proposed predictor is greater than that for traditional filtering, which leads to better exon prediction performance. There are some possible applications of this predictor. Its good localization and protection of sharp variations will make the predictor be suitable to perform fault diagnosis of aero-engine.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Examples of B-spline windows, and domain filter (order 6) with two different scales in the time and frequency domains.
(A) Examples of B-spline windows; (B) Magnitude response of domain filter in the time domain; (C) Frequency response of domain filter.
Fig 2
Fig 2. Prediction plots for sequence HUMDZA2G locus (AZGP1 gene) at different scales.
The abscissa axes of all the plots represent the relative base positions, the actual locations of the exons are marked with rectangles in red dashed lines. Part (A) shows the MSBF result; and (B) shows the result of inter-scale correlation.
Fig 3
Fig 3. Prediction results for the sequence HUMDZA2G locus (AZGP1 gene) using the considered methods.
The abscissa axes of all the plots represent the relative base positions, and the actual locations of exons are marked with rectangles in red dashed lines.
Fig 4
Fig 4. Histogram distributions at different scales for MSBF and inter-scale correlation applied to HMR195.
For all the plots, blue lines represent exons, red lines indicate introns, the abscissa axes represent the magnitude values, and the ordinate axes represent the number of coefficients. Part (A) shows the MSBF result; and (B) shows the result of inter-scale correlation.
Fig 5
Fig 5. Mean values of histogram distributions at different scales for MSBF and inter-scale correlation applied to HMR195.
Fig 6
Fig 6. Plots of approximate correlation (AC) for considered data sets with various methods applied to exons in length ranges.
For all the plots, the ordinate axes denote the ranges of exon lengths.
Fig 7
Fig 7. ROC plots of tested methods using the BG570, HMR195 and ENm001-004 data sets.

Similar articles

Cited by

References

    1. Wu Y, Liew AW-C, Yan H, Yang M. Classification of short human exons and introns based on statistical features. Phys Rev E. 2003;67(6):061916. - PubMed
    1. Saeys Y, Rouzé P, Van de Peer Y. In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics. 2007;23(4):414–20. 10.1093/bioinformatics/btl639 - DOI - PubMed
    1. Jiang R, Yan H. Studies of spectral properties of short genes using the wavelet subspace Hilbert–Huang transform (WSHHT). Physica A: Statistical Mechanics and its Applications. 2008;387(16):4223–47.
    1. Jiang R, Yan H. Segmentation of short human exons based on spectral features of double curves. Int J Data Min Bioinform. 2008;2(1):15–35. - PubMed
    1. Irimia M, Weatheritt RJ, Ellis JD, Parikshak NN, Gonatopoulos-Pournatzis T, Babor M, et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell. 2014;159(7):1511–23. 10.1016/j.cell.2014.11.035 - DOI - PMC - PubMed

Publication types