Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 10;9(1):12949.
doi: 10.1038/s41598-019-49098-w.

ePath: an online database towards comprehensive essential gene annotation for prokaryotes

Affiliations

ePath: an online database towards comprehensive essential gene annotation for prokaryotes

Xiangzhen Kong et al. Sci Rep. .

Abstract

Experimental techniques for identification of essential genes (EGs) in prokaryotes are usually expensive, time-consuming and sometimes unrealistic. Emerging in silico methods provide alternative methods for EG prediction, but often possess limitations including heavy computational requirements and lack of biological explanation. Here we propose a new computational algorithm for EG prediction in prokaryotes with an online database (ePath) for quick access to the EG prediction results of over 4,000 prokaryotes ( https://www.pubapps.vcu.edu/epath/ ). In ePath, gene essentiality is linked to biological functions annotated by KEGG Ortholog (KO). Two new scoring systems, namely, E_score and P_score, are proposed for each KO as the EG evaluation criteria. E_score represents appearance and essentiality of a given KO in existing experimental results of gene essentiality, while P_score denotes gene essentiality based on the principle that a gene is essential if it plays a role in genetic information processing, cell envelope maintenance or energy production. The new EG prediction algorithm shows prediction accuracy ranging from 75% to 91% based on validation from five new experimental studies on EG identification. Our overall goal with ePath is to provide a comprehensive and reliable reference for gene essentiality annotation, facilitating the study of those prokaryotes without experimentally derived gene essentiality information.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Conceptual diagram of the ePath online database and search engine.
Figure 2
Figure 2
Metabolic pathway diagram of E. coli (eco01100 in KEGG pathway database) with the gene essentiality information. The edges in red represent the EGs identified by experiment. The edges in blue represent the missing EGs identified by the ‘remapping’ algorithm in this study. The edges in black represent the non-EGs. The original metabolic pathway map from KEGG is used with KEGG copyright permission number 190185.
Figure 3
Figure 3
Frequency score distribution (A) E_score (only values higher than 0 are shown; N = 2546); (B) P_score (only values higher than 0 are shown; N = 21667); (C) correlation between P_score and E_score. The solid red line represents the best linear fit to the data with R2 = 0.67 (p < 0.001).
Figure 4
Figure 4
Interface of ePath website. The ePath searchable online database for essential genes for 4,000 + strains of prokaryote genomes.
Figure 5
Figure 5
Link missing essential genes in pathway. An illustrative example of the ‘remapping’ algorithm processed on the KEGG pathway map with 10 hypothetical compounds. The left panel represents the map and matrix (S) before the rescoring. In the old map (left panel), the blue edges are non-essential genes, while the red ones are essential genes. The elements in the matrix (S) show the existence and essentiality of the reaction between the two corresponding compounds. The colored elements highlight how the DFS algorithm searches for the linked edges for the first edge. The yellow boxes are the linked edges and the red boxes are the discarded edges. The right panel shows the new map and the updated matrix (S’). Note that in the new map, edges (2–3) and (6–7) are considered as the missing EGs and are labeled red. The S’ provides the final score for each edge, which serves as the basis for EG determination.

Similar articles

Cited by

References

    1. Kobayashi K, et al. Essential Bacillus subtilis genes. Proceedings of the National Academy of Sciences. 2003;100:4678–4683. doi: 10.1073/pnas.0730515100. - DOI - PMC - PubMed
    1. Rancati G, Moffat J, Typas A, Pavelka N. Emerging and evolving concepts in gene essentiality. Nature Reviews Genetics. 2018;19:34–49. doi: 10.1038/nrg.2017.74. - DOI - PubMed
    1. Koonin EV. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Reviews Microbiology. 2003;1:127–136. doi: 10.1038/nrmicro751. - DOI - PubMed
    1. Juhas M, Eberl L, Glass JI. Essence of life: essential genes of minimal genomes. Trends in cell biology. 2011;21:562–568. doi: 10.1016/j.tcb.2011.07.005. - DOI - PubMed
    1. Haselbeck R, et al. Comprehensive essential gene identification as a platform for novel antiinfective drug discovery. Current pharmaceutical design. 2002;8:1155–1172. doi: 10.2174/1381612023394818. - DOI - PubMed

Publication types

LinkOut - more resources