Network-based features enable prediction of essential genes across diverse organisms
- PMID: 30543651
- PMCID: PMC6292609
- DOI: 10.1371/journal.pone.0208722
Network-based features enable prediction of essential genes across diverse organisms
Abstract
Machine learning approaches to predict essential genes have gained a lot of traction in recent years. These approaches predominantly make use of sequence and network-based features to predict essential genes. However, the scope of network-based features used by the existing approaches is very narrow. Further, many of these studies focus on predicting essential genes within the same organism, which cannot be readily used to predict essential genes across organisms. Therefore, there is clearly a need for a method that is able to predict essential genes across organisms, by leveraging network-based features. In this study, we extract several sets of network-based features from protein-protein association networks available from the STRING database. Our network features include some common measures of centrality, and also some novel recursive measures recently proposed in social network literature. We extract hundreds of network-based features from networks of 27 diverse organisms to predict the essentiality of 87000+ genes. Our results show that network-based features are statistically significantly better at classifying essential genes across diverse bacterial species, compared to the current state-of-the-art methods, which use mostly sequence and a few 'conventional' network-based features. Our diverse set of network properties gave an AUROC of 0.847 and a precision of 0.320 across 27 organisms. When we augmented the complete set of network features with sequence-derived features, we achieved an improved AUROC of 0.857 and a precision of 0.335. We also constructed a reduced set of 100 sequence and network features, which gave a comparable performance. Further, we show that our features are useful for predicting essential genes in new organisms by using leave-one-species-out validation. Our network features capture the local, global and neighbourhood properties of the network and are hence effective for prediction of essential genes across diverse organisms, even in the absence of other complex biological knowledge. Our approach can be readily exploited to predict essentiality for organisms in interactome databases such as the STRING, where both network and sequence are readily available. All codes are available at https://github.com/RamanLab/nbfpeg.
Conflict of interest statement
The authors have declared that no competing interests exist.
Similar articles
-
Machine learning approach to gene essentiality prediction: a review.Brief Bioinform. 2021 Sep 2;22(5):bbab128. doi: 10.1093/bib/bbab128. Brief Bioinform. 2021. PMID: 33842944 Review.
-
Prediction of essential genes in prokaryote based on artificial neural network.Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17. Genes Genomics. 2020. PMID: 31736009
-
Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.Microb Inform Exp. 2012 Mar 21;2:4. doi: 10.1186/2042-5783-2-4. Microb Inform Exp. 2012. PMID: 22587966 Free PMC article.
-
DeepHE: Accurately predicting human essential genes based on deep learning.PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep. PLoS Comput Biol. 2020. PMID: 32936825 Free PMC article.
-
Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications.Biotechnol Adv. 2022 Jan-Feb;54:107822. doi: 10.1016/j.biotechadv.2021.107822. Epub 2021 Aug 27. Biotechnol Adv. 2022. PMID: 34461202 Review.
Cited by
-
Comprehensive host-pathogen protein-protein interaction network analysis.BMC Bioinformatics. 2020 Sep 10;21(1):400. doi: 10.1186/s12859-020-03706-z. BMC Bioinformatics. 2020. PMID: 32912135 Free PMC article.
-
Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features.Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32257045 Free PMC article.
-
GENPPI: standalone software for creating protein interaction networks from genomes.BMC Bioinformatics. 2021 Dec 16;22(1):596. doi: 10.1186/s12859-021-04501-0. BMC Bioinformatics. 2021. PMID: 34915867 Free PMC article.
-
Subtractive proteomics-based vaccine targets annotation and reverse vaccinology approaches to identify multiepitope vaccine against Plesiomonas shigelloides.Heliyon. 2024 May 22;10(11):e31304. doi: 10.1016/j.heliyon.2024.e31304. eCollection 2024 Jun 15. Heliyon. 2024. PMID: 38845922 Free PMC article.
-
Evaluation of machine learning classifiers for predicting essential genes in Mycobacterium tuberculosis strains.Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022. Bioinformation. 2022. PMID: 37701504 Free PMC article.
References
-
- Rancati G, Moffat J, Typas A, Pavelka N. Emerging and evolving concepts in gene essentiality. Nat Rev Genet. 2017;19:34–49. 10.1038/nrg.2017.74 - DOI - PubMed
-
- Zhang X, Acencio ML, Lemke N. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review. Front Physiol. 2016;7:75 10.3389/fphys.2016.00075 - DOI - PMC - PubMed
-
- Mobegi FM, Zomer A, de Jonge MI, van Hijum SAFT. Advances and perspectives in computational prediction of microbial gene essentiality. Brief Funct Genomics. 2017;16(2):70–79. 10.1093/bfgp/elv063 - DOI - PubMed
-
- Song K, Tong T, Wu F. Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr Biol. 2014;6:460–469. 10.1039/C3IB40241J - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources