Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;38(Database issue):D401-7.
doi: 10.1093/nar/gkp940. Epub 2009 Nov 9.

The MiST2 database: a comprehensive genomics resource on microbial signal transduction

Affiliations

The MiST2 database: a comprehensive genomics resource on microbial signal transduction

Luke E Ulrich et al. Nucleic Acids Res. 2010 Jan.

Abstract

The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-quality domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Semi-automatic algorithm for defining high-quality domain models. (A) Bona fide domain members which have had their structure solved are subjected to iterative PSI-BLAST searches (30) against the UniRef90 (31) database with a stringent E-value threshold. The resulting sequences are then clustered, aligned and edited (CAT, part B) to form the set of core homologs. Remote homologs are identified by the same procedure with a much relaxed threshold and then removing hits that do not match a secondary structure type associated with at least one core homolog. The resulting remote homologs are combined with the core homologs and then subjected to the CAT process to produce the final domain model(s). (B) The CAT sub-algorithm is a divide-and-conquer method for addressing the extreme sequence divergence present in signal transduction families. Markov Clustering Linkage (32) simulates a random-walk through all-versus-all BLAST results and produces clusters of related members. After aligning and editing each individual subgroup, they are further combined into one or more final curated alignments.
Figure 2.
Figure 2.
Screenshots of the MiST2 website. (A) E. coli genome summary page. Below the header and navigational links there are three sections: genome and organism metadata, and a hyperlinked graphical image of the genome’s signal transduction profile; fully linked tables displaying the genomic distribution of one-component, two-component, chemotaxis and ECF signaling proteins by replicon; and lastly a table containing the counts of neighboring two-component proteins. (B) E. coli CheA protein page. The Refseq annotation and database cross-references for the currently viewed protein and corresponding gene is displayed at the top. This is followed by an interactive visualization of the protein’s domain architecture. The genome neighborhood section contains an AJAX-driven, dynamic representation of the genomic context surrounding the currently viewed protein. In the neighboring DNA section, it is possible to retrieve upstream or downstream DNA sequence data. Hyperlinked cross-references to external databases appear at the bottom of the page.

Similar articles

Cited by

References

    1. Kofoid EC, Parkinson JS. Transmitter and receiver modules in bacterial signaling proteins. Proc. Natl Acad. Sci. USA. 1988;85:4981–4985. - PMC - PubMed
    1. Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu. Rev. Biochem. 2000;69:183–215. - PubMed
    1. Ulrich LE, Koonin EV, Zhulin IB. One-component systems dominate signal transduction in prokaryotes. Trends Microbiol. 2005;13:52–56. - PMC - PubMed
    1. Hengge R. Principles of c-di-GMP signalling in bacteria. Nat. Rev. Microbiol. 2009;7:263–273. - PubMed
    1. Wadhams GH, Armitage JP. Making sense of it all: bacterial chemotaxis. Nat. Rev. Mol. Cell Biol. 2004;5:1024–1037. - PubMed

Publication types