. 2004;5(8):R52.

doi: 10.1186/gb-2004-5-8-r52. Epub 2004 Jul 12.

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Richard Bonneau¹, Nitin S Baliga, Eric W Deutsch, Paul Shannon, Leroy Hood

Affiliations

PMID: 15287974
PMCID: PMC507877
DOI: 10.1186/gb-2004-5-8-r52

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Richard Bonneau et al. Genome Biol. 2004.

. 2004;5(8):R52.

doi: 10.1186/gb-2004-5-8-r52. Epub 2004 Jul 12.

Authors

Richard Bonneau¹, Nitin S Baliga, Eric W Deutsch, Paul Shannon, Leroy Hood

Affiliation

¹ Institute for Systems Biology, Seattle, WA 98103-8904, USA. lhood@systemsbiology.org

PMID: 15287974
PMCID: PMC507877
DOI: 10.1186/gb-2004-5-8-r52

Abstract

Background: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function.

Results: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators.

Conclusions: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.

PubMed Disclaimer

Figures

**Figure 1**
Flow chart depicting the annotation pipeline implemented in this study. Sequence based methods are employed first (top), domains that elude primary sequence based methods are predicted by structure-prediction methods (bottom). For any given genome, data from all levels in this method hierarchy are integrated using SBEAMS (Systems Biology Experiment Analysis and Management System). Implicit in this annotation hierarchy is the idea that protein annotation should be domain-centric (that is, multi-domain proteins should be divided into domains as early as possible in the annotation process). SBEAMS produces a table of annotations where for a given domain only results from the topmost level in the method hierarchy (PDB-BLAST → Pfam → Rosetta) producing a significant hit are displayed.

**Figure 2**
Chemotaxis methyl accepting domains. **(a)** Htr10 (VNG1505g) domain 1 hit to 1ljwA, hemoglobin. The recently deposited structure for the Hemat Sensor domain (1OR4-A) is also shown (red box). The position of the heme (black spheres) is similar in both our predicted fold match (1LJW-A) and the match detected by PSI-BLAST (1OR4-A) **(b)** Htr13 (VNG1013g) hit to Gga1 (1jwfA, involved in protein transport, binding of dipeptide signal sequence), **(c)** the association network surrounding CheA and its interactions with the Htr methyl accepting domains found in the *Halobacterium* genome, as predicted by the phylogenetic profile method (red lines). Also shown are predicted operon edges (black lines). The expression levels (where red corresponds to a high level of expression and green to a low expression relative to a reference; white indicates no change/no measurement) are from a previously described microarray experiment. Nodes marked with asterisks indicate proteins where a domain was folded with Rosetta (resulting in a significant fold match) or annotated using fold recognition. Nodes marked with a 'P' are proteins that were annotated using Pfam. The '!' by yufN indicates that the prior annotation does not agree with our current analysis.

**Figure 3**
IS-element (insertion sequences) rich regions on the minichromosome. **(a)** Segment of *Halobacterium* genome corresponding to genes VNG5101H - sojD (and duplicate region VNG6098H - sojD). IS-elements are shown above as colored boxes. Open reading frames are indicated as red/pink (on 3' strand) or blue/sky-blue boxes (on 5' strand). **(b)** Top ranked Rosetta prediction for VNG6109H shown next to its closest match in the PDB, 1dt9A1 (translation initiation factor sub-domain). **(c)** Segment of *Halobacterium* genome corresponding to genes VNG5244H - VNG5256H (duplicated on the opposite strand elsewhere on the minichromosome, VNG5053H - VNG5041H). **(d)** Top ranked Rosetta prediction for VNG5049H shown next to its closest hit in the PDB, 2ezh. **(e)** Top ranked Rosetta prediction for VNG5047H shown next to its closest hit in the PDB, 1am3.

**Figure 4**
Predicted transcriptional regulators. Rosetta predictions for three *Halobacterium NRC-1* proteins that are consistent with transcription regulation and/or DNA binding. **(a)** The top ranked Rosetta structure prediction for VNG0462C (according to the Rosetta confidence function) is shown next to the diphtheria toxin repressor (1bi2-B). The predicted operon for VNG0462 is shown below; red/pink boxes above the line in this diagram are genes on the 3' strand while genes indicated by rectangles below the line are genes on the 5' strand. **(b)** The top ranked model for VNG5156H (left) is shown next to 1bi2-B, the predicted operon containing VNG5156, VNG5154, VNG5153 and VNG5152 is shown below. **(c)** The top ranked Rosetta prediction for VNG0039H is shown next to its closest match in the PDB, 1bi2-B.

See this image and copyright information in PMC

Cited by

A systems view of haloarchaeal strategies to withstand stress from transition metals.
Kaur A, Pan M, Meislin M, Facciotti MT, El-Gewely R, Baliga NS. Kaur A, et al. Genome Res. 2006 Jul;16(7):841-54. doi: 10.1101/gr.5189606. Epub 2006 Jun 2. Genome Res. 2006. PMID: 16751342 Free PMC article.
Coordination of frontline defense mechanisms under severe oxidative stress.
Kaur A, Van PT, Busch CR, Robinson CK, Pan M, Pang WL, Reiss DJ, DiRuggiero J, Baliga NS. Kaur A, et al. Mol Syst Biol. 2010 Jul;6:393. doi: 10.1038/msb.2010.50. Mol Syst Biol. 2010. PMID: 20664639 Free PMC article.
Comparative Analysis of rRNA Removal Methods for RNA-Seq Differential Expression in Halophilic Archaea.
Pastor MM, Sakrikar S, Rodriguez DN, Schmid AK. Pastor MM, et al. Biomolecules. 2022 May 10;12(5):682. doi: 10.3390/biom12050682. Biomolecules. 2022. PMID: 35625610 Free PMC article.
Bacterial 'Grounded' Prophages: Hotspots for Genetic Renovation and Innovation.
Ramisetty BCM, Sudhakari PA. Ramisetty BCM, et al. Front Genet. 2019 Feb 12;10:65. doi: 10.3389/fgene.2019.00065. eCollection 2019. Front Genet. 2019. PMID: 30809245 Free PMC article.
A role for programmed cell death in the microbial loop.
Orellana MV, Pang WL, Durand PM, Whitehead K, Baliga NS. Orellana MV, et al. PLoS One. 2013 May 8;8(5):e62595. doi: 10.1371/journal.pone.0062595. Print 2013. PLoS One. 2013. PMID: 23667496 Free PMC article.

See all "Cited by" articles

References

1. DasSarma S, Fleischmann EM. Halophiles. Plainview, NY: Cold Spring Harbor Laboratory Press; 1995.
1. McCready S, Marcello L. Repair of UV damage in Halobacterium salinarum. Biochem Soc Trans. 2003;31:694–698. doi: 10.1042/BST0310694. - DOI - PubMed
1. Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MC, Hood L, DiRuggiero J. Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res. 2004;14:1025–1035. doi: 10.1101/gr.1993504. - DOI - PMC - PubMed
1. Rost B, Valencia A. Pitfalls of protein sequence analysis. Curr Opin Biotechnol. 1996;7:457–461. doi: 10.1016/S0958-1669(96)80124-8. - DOI - PubMed
1. Devos D, Valencia A. Practical limits of function prediction. Proteins. 2000;41:98–107. doi: 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.3.CO;2-J. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Affiliation

Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources