Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Feb:23:189-96.
doi: 10.1016/j.mib.2014.11.017. Epub 2015 Jan 21.

Using comparative genomics to drive new discoveries in microbiology

Affiliations
Review

Using comparative genomics to drive new discoveries in microbiology

Daniel H Haft. Curr Opin Microbiol. 2015 Feb.

Abstract

Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Sequence logos for five C-terminal sorting signals
A pattern of design is seen in the domain architecture: a signature motif, then a transmembrane alpha helix, then a cluster of basic (positively charged) residues. As a rule, these sorting signals occur in many times per genome if they occur at all, toward the extreme carboxyl-terminal ends of proteins with predicted N-terminal signal peptides. Sequence logos [31] are show information content, in bits, for multiple alignments derived from Pfam [25] or TIGRFAMs [16] database seed alignments after removal of gappy columns and of uninformative sequence N-terminal to the defining motif. A) PF00746 shows sorting signals led off by the LPxTG motif, spaced a small distance from the start of the transmembrane helix. Proteins are cleaved between the 4th and 5th positions of the motif by sortase, and in most cases transferred to a peptidoglycan precursor. B) The MYXOCTERM predicted sorting signal as modeled by TIGR03901, restricted to the Myxococcales. Its processing enzyme is still unknown [32]. C). The PEP-CTERM sorting signal, modeled by TIGR02595, predicted target of exosortases (TIGR02602) in Gram-negative bacteria [9]. D) PGF-CTERM, modeled by TIGR04126, target of archaeosortase A (TIGR04125) [7, 33]. E) GlyGly-CTERM, modeled by TIGR03501, target of rhombosortase (TIGR03902) [34].

References

    1. Tian W, Skolnick J. How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol. 2003;333:863–882. - PubMed
    1. Koski LB, Golding GB. The closest BLAST hit is often not the nearest neighbor. J Mol Evol. 2001;52:540–542. - PubMed
    1. Schnoes AM, Brown SD, Dodevski I, Babbitt PC. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol. 2009;5:e1000605. - PMC - PubMed
    1. Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7:e1002195. - PMC - PubMed
    1. Galperin MY, Kolker E. New metrics for comparative genomics. Curr Opin Biotechnol. 2006;17:440–447. - PMC - PubMed

Publication types

MeSH terms