Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 21:27:137-148.
doi: 10.1016/j.csbj.2024.12.016. eCollection 2025.

Predicting the location of coordinated metal ion-ligand binding sites using geometry-aware graph neural networks

Affiliations

Predicting the location of coordinated metal ion-ligand binding sites using geometry-aware graph neural networks

Clement Essien et al. Comput Struct Biotechnol J. .

Abstract

More than 50 % of proteins bind to metal ions. Interactions between metal ions and proteins, especially coordinated interactions, are essential for biological functions, such as maintaining protein structure and signal transport. Physiological metal-ion binding prediction is pivotal for both elucidating the biological functions of proteins and for the design of new drugs. However, accurately predicting these interactions remains challenging. In this study, we proposed GPred, a novel structure-based method that transforms the 3-dimensional structure of a protein into a point cloud representation and then designs a geometry-aware graph neural network to learn the local structural properties of each amino acid residue under specific ligand-binding supervision. We trained our model to predict the location of coordinated binding sites for five essential metal ions: Zn2+, Ca2+, Mg2+, Mn2+, and Fe2+. We further demonstrated the versatility of GPred by applying transfer learning to predict the binding sites of 2 heavy metal ions, that is, cadmium (Cd2+) and mercury (Hg2+). We achieved greater than 19.62 %, 14.32 %, 36.62 %, and 40.69 % improvement in the area under the precision-recall curve (AUPR) of Zn2+, Ca2+, Mg2+, Mn2+, and Fe2+, respectively, when compared with 6 current accessible state-of-the-art sequence-based or structure-based tools. We also validated the proposed approach on protein structures predicted by AlphaFold2, and its performance was similar to experimental protein structures. In both cases, achieving a low false discovery rate for proteins without annotated ion-binding sites was demonstrated. © 2017 Elsevier Inc. All rights reserved.

Keywords: Binding sites; Graph neural network; Metal ions; Point cloud; Protein structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Data preparation and feature encoding. (a) Initially, we collected datasets of critical and non-critical metal ions from protein sequences exceeding 50 amino acids in length from the BioLip and MetalPDB databases. In the second step, we used the CD-HIT tool to remove redundant entries and then divided the data into training and test sets under fivefold cross-validation. Additionally, we extracted related ion-binding annotations from the LINK field in PDB files. (b) We constructed 3-D point clouds to represent involved proteins at the atomic level using a rich 59-dimensional atomic feature derived from PSI-BLAST and RDKit, resulting in a 20-dimensional PSSM descriptor and a 39-dimensional atomic descriptor. “A” represents the total number of atoms.
Fig. 2
Fig. 2
Model Architecture. (a) To accurately represent the micro-environment for each point (atom) in our constructed point cloud, a ball query with a radius of 5 Å was conducted to channel neighboring points into a Point Transformer layer, which learns both the physical and geometric nuances of the structure. (b) The learned atomic embeddings were pooled into residual representations for further binding-site prediction. To exclude nonbinding-site candidates, a masking strategy was applied, followed by a sigmoid layer to classify each candidate amino acid. (c) The illustration shows how the Point Transformer learns structural representations through the coordinates of indexing points and their neighboring points.
Fig. 3
Fig. 3
Examples of GPred predictions of coordinated zinc-binding sites at different locations of proteins. Our tool, GPred, can accurately predict coordinated zinc-binding sites located at the interior residue C of protein 1A73-A (a), the edge residue C of protein 1JJD-A (b), and the exterior residue H of protein 1K9Z-A (c), where the predicted binding sites were marked in green, their neighboring residues at surface were colored in yellow and other residues at surface were plotted in blue. The Zn²⁺ ions, depicted as grey spheres, are positioned based on the annotations from the PDB file 1A73 for visualization purposes. Their locations were not predicted by GPred.
Fig. 4
Fig. 4
Different hyperparameter settings in grouping radii. To explore the optimal definition of neighboring points, we designed a series of comparative experiments using different grouping radii and combining 2 options of grouping radii in 2 Point Transformers.
Fig. 5
Fig. 5
Receiver operating characteristic curves for GPred and competitive tools.
Fig. 6
Fig. 6
Importance of each feature to the predictive performance.
Fig. 7
Fig. 7
Difference in F1-Scores between the crystal structure and the AlphaFold2-predicted structures. (a): In most cases, the F1 scores between crystal structure and predicted structure are close, but there are also inconsistent samples, such as 8AMY_A. (b): In the case of 8AMY_A, the crystal structure is presented in green, whereas the predicted structure is presented in yellow. The geometric configuration of the binding sites is shown through a detailed local graph. There are significant differences in geometric features between the crystal structure and the predicted structure at the binding site. hθ is the geometric function described in Section 2.3.
Fig. 8
Fig. 8
Receiver operating characteristic and precision-recall curves for the GPred and competitive tools on the dataset mixing with non-binding site data.
Fig. 9
Fig. 9
Receiver operating characteristic and precision-recall curves for the GPred and competitive tools on dataset defining Fe, Mn, Ca and Mg binding sites as Zn’s negatives.
Fig. 10
Fig. 10
Receiver operating characteristic and precision-recall curves for the GPred and competitive tools on dataset with lower structural similarity than TM-score of 0.5 compared to training data.

Similar articles

Cited by

References

    1. Raff M., Alberts B., Lewis J., et al. Mol Biol Cell 4th Ed. 2002
    1. Von Hippel P.H., Schleich T. Ion effects on the solution structure of biological macromolecules. Acc Chem Res. 1969;2(9):257–265.
    1. Urnov F.D., Rebar E.J., Holmes M.C., et al. Genome editing with engineered zinc finger nucleases. Nat Rev Genet. 2010;11(9):636–646. - PubMed
    1. Hardison R.C. A brief history of hemoglobins: plant, animal, protist, and bacteria. Proc Natl Acad Sci. 1996;93(12):5675–5679. - PMC - PubMed
    1. Ram B.P., Munjal D.D., Fraser I.H. Galactosyltransferases: physical, chemical, and biological aspect. Crit Rev Biochem. 1985;17(3):257–311. - PubMed

LinkOut - more resources