Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar;32(3):e4591.
doi: 10.1002/pro.4591.

Classifying metal-binding sites with neural networks

Affiliations

Classifying metal-binding sites with neural networks

Marjolein Oostrom et al. Protein Sci. 2023 Mar.

Abstract

To advance our ability to predict impacts of the protein scaffold on catalysis, robust classification schemes to define features of proteins that will influence reactivity are needed. One of these features is a protein's metal-binding ability, as metals are critical to catalytic conversion by metalloenzymes. As a step toward realizing this goal, we used convolutional neural networks (CNNs) to enable the classification of a metal cofactor binding pocket within a protein scaffold. CNNs enable images to be classified based on multiple levels of detail in the image, from edges and corners to entire objects, and can provide rapid classification. First, six CNN models were fine-tuned to classify the 20 standard amino acids to choose a performant model for amino acid classification. This model was then trained in two parallel efforts: to classify a 2D image of the environment within a given radius of the central metal binding site, either an Fe ion or a [2Fe-2S] cofactor, with the metal visible (effort 1) or the metal hidden (effort 2). We further used two sub-classifications of the [2Fe-2S] cofactor: (1) a standard [2Fe-2S] cofactor and (2) a Rieske [2Fe-2S] cofactor. The accuracy for the model correctly identifying all three defined features was >95%, despite our perception of the increased challenge of the metalloenzyme identification. This demonstrates that machine learning methodology to classify and distinguish similar metal-binding sites, even in the absence of a visible cofactor, is indeed possible and offers an additional tool for metal-binding site identification in proteins.

Keywords: Rieske; amino acids; convolutional neural network; image classification; iron-sulfur; metal-binding sites; metalloenzyme.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Confusion matrix from the ResNet model with the test dataset as input. Each amino acid employed 250 rotations.
FIGURE 2
FIGURE 2
Improperly classified images from three structurally similar amino acids (valine, isoleucine, and leucine). The side chain branching patterns are masked at various rotations, leading to improper classification.
FIGURE 3
FIGURE 3
Metal cofactors employed in this study. (Panels a–c) A closeup view of the metal cofactors with the four coordinating protein residues with the correct bonds manually added (top) and as they appear without adjustments to the Crystallographic Information File (CIF) files in PyMol (bottom). Panel (a) displays an Fe cofactor from a rubredoxin metalloprotein (PDB 4D4O). Panels (b) and (c) show two [2Fe‐2S] metal cofactors: standard (PDB: 6TGA) and Rieske [2Fe‐2S] (PDB: 1BGY), respectively. Atoms were colored according to their respective element (C = gray, N = blue, O = red, H = white, S = yellow). PDB, Protein Data Bank.
FIGURE 4
FIGURE 4
Confusion matrix for the ResNet neural network to properly classify the environment around the metal cofactor, amino acids within 6.0 Å, as belonging to the metal cofactor. Each metal center employed 500 images.
FIGURE 5
FIGURE 5
Incorrect classification of the image of the environment around the metal cofactor—the amino acids within 6.0 Å of a metal cofactor, and the metal cofactor. The model incorrectly classified the Fe cofactor as a [2Fe‐2S] cofactor at the rotation shown in the top panels. The bottom panels show the same cofactors but rotated so that the cysteine ligands are clearly visible (circled and colored in green). The green color is for illustrative propose—the image used by the model had the same color scheme used for all atoms. The metal residual identification of the images are: (a) PDB: FD4O, Chain: C, Residue: 501; (b) PDB id: 4X33, Chain: A, Residue: 101; (c) PDB id: 6J27, Chain: C, Residue: 401. PDB, Protein Data Bank.
FIGURE 6
FIGURE 6
Confusion matrix for the ResNet neural network to properly classify the environment around the metal cofactor—the amino acids within 6.0 Å of a metal cofactor—as belonging to the metal cofactor in the absence of the native metal cofactors. The images used the same database of PDB IDs and same rotations as in Figure 4 with the exception that the atoms of the Fe containing metal cofactors were removed.

References

    1. Agar JN, Dean DR, Johnson MK. Iron‐sulfur cluster biosynthesis. Biochemistry and physiology of anaerobic bacteria. Berlin, Germany: Springer; 2003. p. 46–66.
    1. Andreeva A. Classification of proteins: available structural space for molecular modeling. Homology modeling. Berlin, Germany: Springer; 2011. p. 1–31. - PubMed
    1. Andreini C, Bertini I, Cavallaro G, Holliday GL, Thornton JM. Metal ions in biological catalysis: from enzyme databases to general principles. J Biol Inorg Chem. 2008;13(8):1205–18. - PubMed
    1. Andreini C, Cavallaro G, Lorenzini S, Rosato A. MetalPDB: a database of metal sites in biological macromolecular structures. Nucleic Acids Res. 2012;41(D1):D312–D9. - PMC - PubMed
    1. Bartlett GJ, Porter CT, Borkakoti N, Thornton JM. Analysis of catalytic residues in enzyme active sites. J Mol Biol. 2002;324(1):105–21. - PubMed

Publication types