Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 10;23(2):83-93.
doi: 10.2174/1389202923666220214122506.

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites

Affiliations

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites

Yingying Yao et al. Curr Genomics. .

Abstract

Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.

Keywords: DNA replication; LASSO; Origin of replication sites; dimensional feature; multi-feature; voting classifier.

PubMed Disclaimer

Figures

Fig. (1)
Fig. (1)
The flow-chart diagram of iORI-LAVT.
Fig. (2)
Fig. (2)
The performance comparison of different feature representation methods on S1.
Fig. (3)
Fig. (3)
The performance comparison of different feature representation methods on S2.
Fig. (4)
Fig. (4)
Comparison of accuracy of different feature selection methods.
Fig. (5)
Fig. (5)
Comparison of accuracy of single classifier with the voting classifier.

Similar articles

Cited by

References

    1. Halazonetis T.D. Conservative DNA replication. Nat. Rev. Mol. Cell Biol. 2014;15(5):300. doi: 10.1038/nrm3784. - DOI - PubMed
    1. Song C., Zhang S., Huang H. Choosing a suitable method for the identification of replication origins in microbial genomes. Front. Microbiol. 2015;6:1049. doi: 10.3389/fmicb.2015.01049. - DOI - PMC - PubMed
    1. Waga S., Stillman B. The DNA replication fork in eukaryotic cells. Annu. Rev. Biochem. 1998;67:721–751. doi: 10.1146/annurev.biochem.67.1.721. - DOI - PubMed
    1. Raghu Ram E.V., Kumar A., Biswas S., Kumar A., Chaubey S., Siddiqi M.I., Habib S. Nuclear gyrB encodes a functional subunit of the Plasmodium falciparum gyrase that is involved in apicoplast DNA replication. Mol. Biochem. Parasitol. 2007;154(1):30–39. doi: 10.1016/j.molbiopara.2007.04.001. - DOI - PubMed
    1. McFadden G.I., Roos D.S. Apicomplexan plastids as drug targets. Trends Microbiol. 1999;7(8):328–333. doi: 10.1016/S0966-842X(99)01547-4. - DOI - PubMed