Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 10;25(1):47.
doi: 10.1186/s12864-024-09958-w.

Essential genes identification model based on sequence feature map and graph convolutional neural network

Affiliations

Essential genes identification model based on sequence feature map and graph convolutional neural network

Wenxing Hu et al. BMC Genomics. .

Abstract

Background: Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes.

Results: In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training.

Conclusions: Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.

Keywords: Bioinformatics; Essential genes; Gene sequences; Graphical convolutional neural networks; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Graph structure of gene sequences
Fig. 2
Fig. 2
Model GCNN-SFM predicts the structural flow of essential genes
Fig. 3
Fig. 3
Comparison of performance results of independent datasets testing graph coding methods with different parameters
Fig. 4
Fig. 4
The impact of graph convolutional layer depth on model performance metrics
Fig. 5
Fig. 5
Performance results of different independent datasets testing the essential gene prediction model
Fig. 6
Fig. 6
Cross-training of datasets from different species
Fig. 7
Fig. 7
Performance comparison of model validation across species
Fig. 8
Fig. 8
Performance comparison of GCNN-SFM with other existing models

Similar articles

Cited by

References

    1. O’Neill RS, Clark DV. The Drosophila melanogaster septin gene Sep2 has a redundant function with the retrogene Sep5 in imaginal cell proliferation but is essential for oogenesis. Genome. 2013;56(12):753–758. - PubMed
    1. Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen K, Arnaud M, et al. Essential Bacillus subtilis genes. Proc Natl Acad Sci. 2003;100(8):4678–4683. - PMC - PubMed
    1. Juhas M, Eberl L, Glass JI. Essence of life: essential genes of minimal genomes. Trends Cell Biol. 2011;21(10):562–568. - PubMed
    1. Juhas M, Reuß DR, Zhu B, Commichau FM. Bacillus subtilis and Escherichia coli essential genes and minimal cell factories after one decade of genome engineering. Microbiology. 2014;160(11):2341–2351. - PubMed
    1. Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006;7(1):1–16. - PMC - PubMed

LinkOut - more resources