The Protein Model Portal

Konstantin Arnold¹, Florian Kiefer, Jürgen Kopp, James N D Battey, Michael Podvinec, John D Westbrook, Helen M Berman, Lorenza Bordoli, Torsten Schwede

Affiliations

PMID: 19037750
PMCID: PMC2704613
DOI: 10.1007/s10969-008-9048-5

The Protein Model Portal

Konstantin Arnold et al. J Struct Funct Genomics. 2009 Mar.

. 2009 Mar;10(1):1-8.

doi: 10.1007/s10969-008-9048-5. Epub 2008 Nov 27.

Authors

Konstantin Arnold¹, Florian Kiefer, Jürgen Kopp, James N D Battey, Michael Podvinec, John D Westbrook, Helen M Berman, Lorenza Bordoli, Torsten Schwede

Affiliation

¹ Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056, Basel, Switzerland.

PMID: 19037750
PMCID: PMC2704613
DOI: 10.1007/s10969-008-9048-5

Abstract

Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.

PubMed Disclaimer

Figures

**Fig. 1**
Reference system based on md5 cryptographic hash sums for UniProt full-length target sequences. In this system, identical target protein sequences are grouped together independent from their individual database accession codes (e.g., Hemoglobin beta chain from Human, Chimpanzee, and Bonobo), while entries which differ in at least one amino acid position are kept separate (e.g., 7E → V variant of Human sickle cell anemia hemoglobin)

**Fig. 2**
Schematic flow of data in Protein Model Portal. Meta information about the available models, i.e., the target protein, template structure, and sequence identity, is retrieved from each partner resource. The UniProt database is used to generate a reference system based on md5 cryptographic hash sums of the full-length primary sequences. Searchable indices are generated for all proteins with model information, allowing for accession code-based queries, matching of amino acid sequence fragments, and sequence similarity searches. The portal communicates with all partner resources and the PSI structural genomics knowledge base via Web services. The three-dimensional coordinates of a model, as well as functional annotation information from UniProt and InterPro is retrieved dynamically in real time when required to generate the web page

**Fig. 3**
Graphical overview of model and experimental structure information available for a specific protein entry. Information about available models is queried from the model portal database; information on experimental structures is retrieved from the PSI SGKB using web services

**Fig. 4**
Typical view of a model detail page. Information about the model provider, the segment of the target protein (e.g., MLP-like protein 34; Arabidopsis thaliana) covered by the model, and the template structure used for model building, are stored in the portal database. All other information required for building the webpage, such as the coordinates of the model, the PFAM domain structure, and UniProt annotation of the protein sequence, is retrieved dynamically

**Fig. 5**
Distribution of chain length. The histogram shows the length distribution of models provided by the model portal. The maximum around 150 residues indicates that the majority of models consist of single domains. However, more than one quarter of the models have significantly longer chains of more than 300 residues

**Fig. 6**
Model quality on residue level. For each residue, the model with the highest sequence identity between target and template is considered. The pie chart shows the percentage of residues which can be modeled at a certain identity level. For the majority of modeled residues (41%) the targets shares between 20% and 40% sequence identity with the templates

See this image and copyright information in PMC

References

1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/gkl971', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/gkl971'}, {'type': 'PMC', 'value': 'PMC1669775', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC1669775/'}, {'type': 'PubMed', 'value': '17142228', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17142228/'}]}
2. Berman H et al (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35:D301–D303. doi:10.1093/nar/gkl971 - PMC - PubMed
1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1371/journal.pbio.0050016', 'is_inner': False, 'url': 'https://doi.org/10.1371/journal.pbio.0050016'}, {'type': 'PMC', 'value': 'PMC1821046', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC1821046/'}, {'type': 'PubMed', 'value': '17355171', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17355171/'}]}
2. Yooseph S et al (2007) The sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol 5:e16. doi:10.1371/journal.pbio.0050016 - PMC - PubMed
1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1126/science.1065659', 'is_inner': False, 'url': 'https://doi.org/10.1126/science.1065659'}, {'type': 'PubMed', 'value': '11588250', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/11588250/'}]}
2. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294:93–96. doi:10.1126/science.1065659 - PubMed
1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1016/S1359-6446(04)03196-4', 'is_inner': False, 'url': 'https://doi.org/10.1016/s1359-6446(04)03196-4'}, {'type': 'PMC', 'value': 'PMC7129151', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC7129151/'}, {'type': 'PubMed', 'value': '15279849', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/15279849/'}]}
2. Hillisch A, Pineda LF, Hilgenfeld R (2004) Utility of homology models in the drug discovery process. Drug Discov Today 9:659–669. doi:10.1016/S1359-6446(04)03196-4 - PMC - PubMed
1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/bioinformatics/18.7.934', 'is_inner': False, 'url': 'https://doi.org/10.1093/bioinformatics/18.7.934'}, {'type': 'PubMed', 'value': '12117790', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/12117790/'}]}
2. Peitsch MC (2002) About the use of protein models. Bioinformatics 18:934–938. doi:10.1093/bioinformatics/18.7.934 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Protein Model Portal

Affiliation

The Protein Model Portal

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources