Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Mar 8:7:75.
doi: 10.3389/fphys.2016.00075. eCollection 2016.

Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

Affiliations
Review

Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

Xue Zhang et al. Front Physiol. .

Erratum in

Abstract

Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research.

Keywords: essential genes/proteins; machine learning; network topological features; prediction models; systems biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A toy network showing the calculation of network topological features. We consider node C (yellow node) as an example to show the calculation of the network topological features. The degree centrality (DC) of node C is 4 because it has 4 edges connecting with nodes A, B, D, and E. The betweenness centrality (BC) of node B is the number of times that node B acts as a bridge along the shortest paths between two other nodes. There are six shortest paths between all other pair of nodes (ACD, ACE, AB, BCD, BCE, DE) of which node C acts a bridge 4 times. Then, BC of node C is 4/6 = 0.66. The closeness centrality (CC) of node C is the reciprocal of the average distance from node C to other nodes. Therefore, CC of node C is 1. The clustering coefficient (CCo) of node C is calculated as the proportion of actual connections among its neighbors (A, B, D, and E) that is, in this case, 2, and the number of all possible connections among its neighbors (in this case, 6). Therefore, CCo of node C is 2/6 = 0.33.

References

    1. Acencio M. L., Lemke N. (2009). Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10:290. 10.1186/1471-2105-10-290 - DOI - PMC - PubMed
    1. Becker S. A., Palsson B. Ø. (2005). Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 5:8. 10.1186/1471-2180-5-8 - DOI - PMC - PubMed
    1. Bonacich P. (1987). Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182. 10.1086/228631 - DOI
    1. Chen L., Ge X., Xu P. (2015). Identifying essential Streptococcus sanguinis genes using genome-wide deletion mutation. Methods Mol. Biol. 1279, 15–23. 10.1007/978-1-4939-2398-4_2 - DOI - PMC - PubMed
    1. Chen W. H., Minguez P., Lercher M. J., Bork P. (2012). OGEE: an online gene essentiality database. Nucleic Acids Res. 40, D901–D906. 10.1093/nar/gkr986 - DOI - PMC - PubMed