Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
- PMID: 36778978
- PMCID: PMC9878833
- DOI: 10.2174/1389202923666220214122506
Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
Abstract
Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.
Keywords: DNA replication; LASSO; Origin of replication sites; dimensional feature; multi-feature; voting classifier.
© 2022 Bentham Science Publishers.
Figures





Similar articles
-
iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning.SAR QSAR Environ Res. 2021 Apr;32(4):317-331. doi: 10.1080/1062936X.2021.1895884. Epub 2021 Mar 18. SAR QSAR Environ Res. 2021. PMID: 33730950
-
A computational platform to identify origins of replication sites in eukaryotes.Brief Bioinform. 2021 Mar 22;22(2):1940-1950. doi: 10.1093/bib/bbaa017. Brief Bioinform. 2021. PMID: 32065211
-
OriC-ENS: A sequence-based ensemble classifier for predicting origin of replication in S. cerevisiae.Comput Biol Chem. 2021 Jun;92:107502. doi: 10.1016/j.compbiolchem.2021.107502. Epub 2021 Apr 26. Comput Biol Chem. 2021. PMID: 33962169
-
i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites.Interdiscip Sci. 2021 Sep;13(3):413-425. doi: 10.1007/s12539-021-00429-4. Epub 2021 Apr 8. Interdiscip Sci. 2021. PMID: 33834381
-
Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics.Front Genet. 2018 Dec 10;9:613. doi: 10.3389/fgene.2018.00613. eCollection 2018. Front Genet. 2018. PMID: 30619452 Free PMC article. Review.
Cited by
-
Application of peritumoral radiomics based on simulated positioning CT images in the prognosis of intermediate-advanced esophageal cancer.Sci Rep. 2025 Apr 7;15(1):11865. doi: 10.1038/s41598-024-82392-w. Sci Rep. 2025. PMID: 40195320 Free PMC article.
-
Construction and evaluation of a height prediction model for children with growth disorders treated with recombinant human growth hormone.BMC Endocr Disord. 2025 Jul 9;25(1):170. doi: 10.1186/s12902-025-01991-4. BMC Endocr Disord. 2025. PMID: 40634928 Free PMC article.
-
Integrated analysis of diverse cancer types reveals a breast cancer-specific serum miRNA biomarker through relative expression orderings analysis.Breast Cancer Res Treat. 2024 Apr;204(3):475-484. doi: 10.1007/s10549-023-07208-3. Epub 2024 Jan 8. Breast Cancer Res Treat. 2024. PMID: 38191685 Free PMC article.
References
-
- Raghu Ram E.V., Kumar A., Biswas S., Kumar A., Chaubey S., Siddiqi M.I., Habib S. Nuclear gyrB encodes a functional subunit of the Plasmodium falciparum gyrase that is involved in apicoplast DNA replication. Mol. Biochem. Parasitol. 2007;154(1):30–39. doi: 10.1016/j.molbiopara.2007.04.001. - DOI - PubMed
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous