Discriminative motif discovery via simulated evolution and random under-sampling
- PMID: 24551063
- PMCID: PMC3923751
- DOI: 10.1371/journal.pone.0087670
Discriminative motif discovery via simulated evolution and random under-sampling
Abstract
Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Conflict of interest statement
Figures




Similar articles
-
Discriminative motif finding for predicting protein subcellular localization.IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):441-51. doi: 10.1109/TCBB.2009.82. IEEE/ACM Trans Comput Biol Bioinform. 2011. PMID: 21233524 Free PMC article.
-
A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences.IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):370-86. doi: 10.1109/TCBB.2008.103. IEEE/ACM Trans Comput Biol Bioinform. 2009. PMID: 19644166
-
Discriminative motif discovery in DNA and protein sequences using the DEME algorithm.BMC Bioinformatics. 2007 Oct 15;8:385. doi: 10.1186/1471-2105-8-385. BMC Bioinformatics. 2007. PMID: 17937785 Free PMC article.
-
Hidden Markov Models for prediction of protein features.Methods Mol Biol. 2008;413:173-98. doi: 10.1007/978-1-59745-574-9_7. Methods Mol Biol. 2008. PMID: 18075166 Review.
-
An evolutionary perspective on eukaryotic membrane trafficking.Adv Exp Med Biol. 2007;607:73-83. doi: 10.1007/978-0-387-74021-8_6. Adv Exp Med Biol. 2007. PMID: 17977460 Review.
Cited by
-
The minimotif synthesis hypothesis for the origin of life.J Transl Sci. 2016;2(5):289-296. doi: 10.15761/JTS.1000154. Epub 2016 Jul 19. J Transl Sci. 2016. PMID: 28083146 Free PMC article.
-
Single base-pair resolution analysis of DNA binding motif with MoMotif reveals an oncogenic function of CTCF zinc-finger 1 mutation.Nucleic Acids Res. 2022 Aug 26;50(15):8441-8458. doi: 10.1093/nar/gkac658. Nucleic Acids Res. 2022. PMID: 35947648 Free PMC article.
References
-
- Bailey TL (2008) Discovering sequence motifs. In: Comparative Genomics, Springer. 271–292.
-
- Eddy SR (1998) Profile hidden markov models. Bioinformatics 14: 755–763. - PubMed
-
- Sinha S (2006) On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22: e454–e463. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources