Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 15;435(14):168021.
doi: 10.1016/j.jmb.2023.168021. Epub 2023 Feb 23.

ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models

Affiliations

ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models

Brinda Vallat et al. J Mol Biol. .

Abstract

ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.

Keywords: Computed Structure Models; Data Standard; ModelCIF; PDBx/mmCIF; Protein Structure Prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.
Schematic representation of modeling methods using target sequence(s), structure databases (e.g., PDB), and sequence databases (e.g., Uniclust30 [51]) as input to produce CSMs and estimates of prediction confidence. Homology modeling uses specific templates as its main input, while ab initio methods work without templates. Commonly used ab initio methods rely on multiple sequence alignments, which are either used directly as input for end-to-end structure prediction or processed to extract spatial restraints used to generate CSMs.
Figure 2.
Figure 2.
Schematic representation of the data specifications in ModelCIF. Definitions reused from PDBx/mmCIF are shown in white boxes (e.g., Atomic Coordinates) and the newly added definitions are shown in gray boxes (e.g., Model Quality Metrics). (A) Descriptions are provided for input data used in template-based and template-free modeling. (B) Representations of molecular components are retained from PDBx/mmCIF. (C) Definitions for atomic coordinates, secondary structure features, and ensembles are taken from PDBx/mmCIF; descriptions of local and global CSM quality metrics are defined in ModelCIF. (D) Several metadata definitions from PDBx/mmCIF are reused. New metadata definitions regarding modeling protocol, CSM classification (ab initio, homology, etc.) and descriptions of associated files are included in ModelCIF. Examples of CSM-specific data and metadata represented in ModelCIF are provided in the Supplementary Material.

References

    1. Protein Data Bank. (1971). Crystallography: Protein Data Bank. Nature (London), New Biol. 233, 223–223. - PubMed
    1. wwPDB consortium. (2019). Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47, D520–D528. - PMC - PubMed
    1. Anfinsen CR. (1973). Principles that govern the folding of protein chains. Science. 181, 223–230. - PubMed
    1. Baker D, Sali A. (2001). Protein structure prediction and structural genomics. Science. 294, 93–96. - PubMed
    1. Gobel U, Sander C, Schneider R, Valencia A. (1994). Correlated mutations and residue contacts in proteins. Proteins. 18, 309–317. - PubMed

Publication types

Substances

LinkOut - more resources