Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 19;19(1):948.
doi: 10.1186/s12864-018-5221-9.

Combining multiple functional annotation tools increases coverage of metabolic annotation

Affiliations

Combining multiple functional annotation tools increases coverage of metabolic annotation

Marc Griesemer et al. BMC Genomics. .

Abstract

Background: Genome-scale metabolic modeling is a cornerstone of systems biology analysis of microbial organisms and communities, yet these genome-scale modeling efforts are invariably based on incomplete functional annotations. Annotated genomes typically contain 30-50% of genes without functional annotation, severely limiting our knowledge of the "parts lists" that the organisms have at their disposal. These incomplete annotations may be sufficient to derive a model of a core set of well-studied metabolic pathways that support growth in pure culture. However, pathways important for growth on unusual metabolites exchanged in complex microbial communities are often less understood, resulting in missing functional annotations in newly sequenced genomes.

Results: Here, we present results on a comprehensive reannotation of 27 bacterial reference genomes, focusing on enzymes with EC numbers annotated by KEGG, RAST, EFICAz, and the BRENDA enzyme database, and on membrane transport annotations by TransportDB, KEGG and RAST. Our analysis shows that annotation using multiple tools can result in a drastically larger metabolic network reconstruction, adding on average 40% more EC numbers, 3-8 times more substrate-specific transporters, and 37% more metabolic genes. These results are even more pronounced for bacterial species that are phylogenetically distant from well-studied model organisms such as E. coli.

Conclusions: Metabolic annotations are often incomplete and inconsistent. Combining multiple functional annotation tools can greatly improve genome coverage and metabolic network size, especially for non-model organisms and non-core pathways.

Keywords: Enzyme prediction; Functional annotation; Genome annotation; Metabolic modeling; Transport prediction.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Large differences exist between the sets of Gene-EC annotations generated by the four annotation tools across the 27 reference genomes
Fig. 2
Fig. 2
Gene-EC annotations produced by KEGG and RAST for E. coli K-12, compared to the EcoCyc gold standard. The sets and intersections are drawn proportionally to the number of annotations in each
Fig. 3
Fig. 3
Reaction overlap between the annotation tools (average number of EC numbers per genome)
Fig. 4
Fig. 4
Precision vs Recall of EC numbers for different combinations of tools on EcoCyc. Individual tools are denoted by B, E, K, or R for BRENDA, EFICAz, KEGG, and RAST, respectively. For each combination of tools, we calculated precision and recall for both the union and intersection of the sets of EC numbers generated by each tool. The union corresponds to the set of EC numbers generated by at least one of the tools in the combination, while the intersection corresponds to those EC numbers generated by every single tool in the combination
Fig. 5
Fig. 5
Genome coverage and overlap in annotations varies across genomes. a Horizontal bars represent the fraction of the total number of EC numbers for each genome produced by only a single tool, or by two, three or all four tools. The 27 reference genomes were sorted with respect to the fraction of EC numbers that were predicted by 3 or more tools (blue bars). The top of the list is dominated by model organisms such as E. coli, B. subtilis, and closely related organisms. As we move farther away from such well-studied model organisms, the fraction of unique EC numbers predicted only by a single tool (red bars) increases, at the expense of those predicted by multiple tools. b The fraction of genes annotated as enzymes by each tool likewise decreases as we move farther away from model organisms such as E. coli. Note that two of the organisms with a drastically reduced genome content, Candidatus Portiera aleyrodidarum BT-QVLC and Candidatus Evansia muelleri, also have a relatively higher fraction of core metabolic enzymes
Fig. 6
Fig. 6
a Total number of genes annotated as transporters, regardless of substrate. b Transporter annotations with substrates predictions specific enough to be included in metabolic models (rank 1 or 2)

Similar articles

Cited by

References

    1. Kyrpides NC. Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol. 2009;27:627–632. doi: 10.1038/nbt.1552. - DOI - PubMed
    1. Kyrpides NC, Ouzounis CA. Whole-genome sequence annotation:“Going wrong with confidence.”. Mol Microbiol. 1999;32:886–887. doi: 10.1046/j.1365-2958.1999.01380.x. - DOI - PubMed
    1. Koonin EV, Mushegian AR, Rudd KE. Sequencing and analysis of bacterial genomes. Curr Biol. 1996;6:404–416. doi: 10.1016/S0960-9822(02)00508-0. - DOI - PubMed
    1. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big Data: Astronomical or Genomical? PLoS Biol. 2015;13:e1002195. doi: 10.1371/journal.pbio.1002195. - DOI - PMC - PubMed
    1. Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. The future of biocuration. Nature. 2008;455:47–50. doi: 10.1038/455047a. - DOI - PMC - PubMed

LinkOut - more resources