Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004;5(8):R52.
doi: 10.1186/gb-2004-5-8-r52. Epub 2004 Jul 12.

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Affiliations

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Richard Bonneau et al. Genome Biol. 2004.

Abstract

Background: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function.

Results: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators.

Conclusions: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart depicting the annotation pipeline implemented in this study. Sequence based methods are employed first (top), domains that elude primary sequence based methods are predicted by structure-prediction methods (bottom). For any given genome, data from all levels in this method hierarchy are integrated using SBEAMS (Systems Biology Experiment Analysis and Management System). Implicit in this annotation hierarchy is the idea that protein annotation should be domain-centric (that is, multi-domain proteins should be divided into domains as early as possible in the annotation process). SBEAMS produces a table of annotations where for a given domain only results from the topmost level in the method hierarchy (PDB-BLAST → Pfam → Rosetta) producing a significant hit are displayed.
Figure 2
Figure 2
Chemotaxis methyl accepting domains. (a) Htr10 (VNG1505g) domain 1 hit to 1ljwA, hemoglobin. The recently deposited structure for the Hemat Sensor domain (1OR4-A) is also shown (red box). The position of the heme (black spheres) is similar in both our predicted fold match (1LJW-A) and the match detected by PSI-BLAST (1OR4-A) (b) Htr13 (VNG1013g) hit to Gga1 (1jwfA, involved in protein transport, binding of dipeptide signal sequence), (c) the association network surrounding CheA and its interactions with the Htr methyl accepting domains found in the Halobacterium genome, as predicted by the phylogenetic profile method (red lines). Also shown are predicted operon edges (black lines). The expression levels (where red corresponds to a high level of expression and green to a low expression relative to a reference; white indicates no change/no measurement) are from a previously described microarray experiment. Nodes marked with asterisks indicate proteins where a domain was folded with Rosetta (resulting in a significant fold match) or annotated using fold recognition. Nodes marked with a 'P' are proteins that were annotated using Pfam. The '!' by yufN indicates that the prior annotation does not agree with our current analysis.
Figure 3
Figure 3
IS-element (insertion sequences) rich regions on the minichromosome. (a) Segment of Halobacterium genome corresponding to genes VNG5101H - sojD (and duplicate region VNG6098H - sojD). IS-elements are shown above as colored boxes. Open reading frames are indicated as red/pink (on 3' strand) or blue/sky-blue boxes (on 5' strand). (b) Top ranked Rosetta prediction for VNG6109H shown next to its closest match in the PDB, 1dt9A1 (translation initiation factor sub-domain). (c) Segment of Halobacterium genome corresponding to genes VNG5244H - VNG5256H (duplicated on the opposite strand elsewhere on the minichromosome, VNG5053H - VNG5041H). (d) Top ranked Rosetta prediction for VNG5049H shown next to its closest hit in the PDB, 2ezh. (e) Top ranked Rosetta prediction for VNG5047H shown next to its closest hit in the PDB, 1am3.
Figure 4
Figure 4
Predicted transcriptional regulators. Rosetta predictions for three Halobacterium NRC-1 proteins that are consistent with transcription regulation and/or DNA binding. (a) The top ranked Rosetta structure prediction for VNG0462C (according to the Rosetta confidence function) is shown next to the diphtheria toxin repressor (1bi2-B). The predicted operon for VNG0462 is shown below; red/pink boxes above the line in this diagram are genes on the 3' strand while genes indicated by rectangles below the line are genes on the 5' strand. (b) The top ranked model for VNG5156H (left) is shown next to 1bi2-B, the predicted operon containing VNG5156, VNG5154, VNG5153 and VNG5152 is shown below. (c) The top ranked Rosetta prediction for VNG0039H is shown next to its closest match in the PDB, 1bi2-B.

Similar articles

Cited by

References

    1. DasSarma S, Fleischmann EM. Halophiles. Plainview, NY: Cold Spring Harbor Laboratory Press; 1995.
    1. McCready S, Marcello L. Repair of UV damage in Halobacterium salinarum. Biochem Soc Trans. 2003;31:694–698. doi: 10.1042/BST0310694. - DOI - PubMed
    1. Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MC, Hood L, DiRuggiero J. Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res. 2004;14:1025–1035. doi: 10.1101/gr.1993504. - DOI - PMC - PubMed
    1. Rost B, Valencia A. Pitfalls of protein sequence analysis. Curr Opin Biotechnol. 1996;7:457–461. doi: 10.1016/S0958-1669(96)80124-8. - DOI - PubMed
    1. Devos D, Valencia A. Practical limits of function prediction. Proteins. 2000;41:98–107. doi: 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.3.CO;2-J. - DOI - PubMed

Publication types

MeSH terms