Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Mar;10(1):1-8.
doi: 10.1007/s10969-008-9048-5. Epub 2008 Nov 27.

The Protein Model Portal

Affiliations

The Protein Model Portal

Konstantin Arnold et al. J Struct Funct Genomics. 2009 Mar.

Abstract

Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Reference system based on md5 cryptographic hash sums for UniProt full-length target sequences. In this system, identical target protein sequences are grouped together independent from their individual database accession codes (e.g., Hemoglobin beta chain from Human, Chimpanzee, and Bonobo), while entries which differ in at least one amino acid position are kept separate (e.g., 7E → V variant of Human sickle cell anemia hemoglobin)
Fig. 2
Fig. 2
Schematic flow of data in Protein Model Portal. Meta information about the available models, i.e., the target protein, template structure, and sequence identity, is retrieved from each partner resource. The UniProt database is used to generate a reference system based on md5 cryptographic hash sums of the full-length primary sequences. Searchable indices are generated for all proteins with model information, allowing for accession code-based queries, matching of amino acid sequence fragments, and sequence similarity searches. The portal communicates with all partner resources and the PSI structural genomics knowledge base via Web services. The three-dimensional coordinates of a model, as well as functional annotation information from UniProt and InterPro is retrieved dynamically in real time when required to generate the web page
Fig. 3
Fig. 3
Graphical overview of model and experimental structure information available for a specific protein entry. Information about available models is queried from the model portal database; information on experimental structures is retrieved from the PSI SGKB using web services
Fig. 4
Fig. 4
Typical view of a model detail page. Information about the model provider, the segment of the target protein (e.g., MLP-like protein 34; Arabidopsis thaliana) covered by the model, and the template structure used for model building, are stored in the portal database. All other information required for building the webpage, such as the coordinates of the model, the PFAM domain structure, and UniProt annotation of the protein sequence, is retrieved dynamically
Fig. 5
Fig. 5
Distribution of chain length. The histogram shows the length distribution of models provided by the model portal. The maximum around 150 residues indicates that the majority of models consist of single domains. However, more than one quarter of the models have significantly longer chains of more than 300 residues
Fig. 6
Fig. 6
Model quality on residue level. For each residue, the model with the highest sequence identity between target and template is considered. The pie chart shows the percentage of residues which can be modeled at a certain identity level. For the majority of modeled residues (41%) the targets shares between 20% and 40% sequence identity with the templates

References

    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/gkl971', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/gkl971'}, {'type': 'PMC', 'value': 'PMC1669775', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC1669775/'}, {'type': 'PubMed', 'value': '17142228', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17142228/'}]}
    2. Berman H et al (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35:D301–D303. doi:10.1093/nar/gkl971 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1371/journal.pbio.0050016', 'is_inner': False, 'url': 'https://doi.org/10.1371/journal.pbio.0050016'}, {'type': 'PMC', 'value': 'PMC1821046', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC1821046/'}, {'type': 'PubMed', 'value': '17355171', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17355171/'}]}
    2. Yooseph S et al (2007) The sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol 5:e16. doi:10.1371/journal.pbio.0050016 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1126/science.1065659', 'is_inner': False, 'url': 'https://doi.org/10.1126/science.1065659'}, {'type': 'PubMed', 'value': '11588250', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/11588250/'}]}
    2. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294:93–96. doi:10.1126/science.1065659 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1016/S1359-6446(04)03196-4', 'is_inner': False, 'url': 'https://doi.org/10.1016/s1359-6446(04)03196-4'}, {'type': 'PMC', 'value': 'PMC7129151', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC7129151/'}, {'type': 'PubMed', 'value': '15279849', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/15279849/'}]}
    2. Hillisch A, Pineda LF, Hilgenfeld R (2004) Utility of homology models in the drug discovery process. Drug Discov Today 9:659–669. doi:10.1016/S1359-6446(04)03196-4 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/bioinformatics/18.7.934', 'is_inner': False, 'url': 'https://doi.org/10.1093/bioinformatics/18.7.934'}, {'type': 'PubMed', 'value': '12117790', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/12117790/'}]}
    2. Peitsch MC (2002) About the use of protein models. Bioinformatics 18:934–938. doi:10.1093/bioinformatics/18.7.934 - PubMed

Publication types

LinkOut - more resources