Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 4;44(D1):D330-5.
doi: 10.1093/nar/gkv1324. Epub 2015 Dec 3.

COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps

Affiliations

COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps

Yi-Chien Chang et al. Nucleic Acids Res. .

Abstract

The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Progress for DNA sequencing, blue diamonds, left axis, logarithmic scale. Progress in gene function assignment, red squares, right axis. Red squares represent individual genomes (selected randomly after 2007). Chronologically: H. influenzae, E. coli, P. aeruginosa, Magnetospirillum sp. strain AMB-1, Halogeometricum borinquense type strain (PR3T), Odoribacter splanchnicus type strain (1651/6T), Desulfotomaculum ruminis type strain (DLT).
Figure 2.
Figure 2.
The search interface for COMBREX-DB, allowing users to search by gene, organism, functional status with a variety of filters.
Figure 3.
Figure 3.
A sample search result for the query ‘methionine aminopeptidase.’ Results are organized by Cluster (red arrows). The functional status of member proteins are summarized graphically (blue arrow). Clusters are ranked by phylogenetic spread score (green box) and number of members (see text for details). All results can be easily downloaded (purple arrow).
Figure 4.
Figure 4.
Histograms of average distance for each protein in a cluster to all other proteins in the cluster. Proteins with the shortest average distance to all others, are considered ‘most typical’ for the cluster, and recommended for experimental testing in clusters which have no experimentally validated protein. Panels B and C indicate clusters with potential substructure, indicating the likely necessity for testing multiple proteins experimentally within a cluster for an adequate characterization.

References

    1. Anton B.P., Chang Y.C., Brown P., Choi H.P., Faller L.L., Guleria J., Hu Z., Klitgord N., Levy-Moonshine A., Maksad A., et al. The COMBREX project: design, methodology, and initial results. PLoS Biol. 2013;11:e1001638. - PMC - PubMed
    1. Schnoes A.M., Brown S.D., Dodevski I., Babbitt P.C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 2009;5:e1000605. - PMC - PubMed
    1. Roberts R.J., Chang Y.C., Hu Z., Rachlin J.N., Anton B.P., Pokrzywa R.M., Choi H.P., Faller L.L., Guleria J., Housman G., et al. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res. 2011;39:D11–D14. - PMC - PubMed
    1. Tatusova T., Ciufo S., Federhen S., Fedorov B., McVeigh R., O'Neill K., Tolstoy I., Zaslavsky L. Update on RefSeq microbial genomes resources. Nucleic Acids Res. 2015;43:D599–D605. - PMC - PubMed
    1. UniProtConsortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. - PMC - PubMed

Publication types

MeSH terms