Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr;20(201):20220727.
doi: 10.1098/rsif.2022.0727. Epub 2023 Apr 26.

Homology of homologous knotted proteins

Affiliations

Homology of homologous knotted proteins

Katherine Benjamin et al. J R Soc Interface. 2023 Apr.

Abstract

Quantification and classification of protein structures, such as knotted proteins, often requires noise-free and complete data. Here, we develop a mathematical pipeline that systematically analyses protein structures. We showcase this geometric framework on proteins forming open-ended trefoil knots, and we demonstrate that the mathematical tool, persistent homology, faithfully represents their structural homology. This topological pipeline identifies important geometric features of protein entanglement and clusters the space of trefoil proteins according to their depth. Persistence landscapes quantify the topological difference between a family of knotted and unknotted proteins in the same structural homology class. This difference is localized and interpreted geometrically with recent advancements in systematic computation of homology generators. The topological and geometric quantification we find is robust to noisy input data, which demonstrates the potential of this approach in contexts where standard knot theoretic tools fail.

Keywords: generators; knotted proteins; persistent homology; topological statistical analysis.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

Figure 1.
Figure 1.
Dataset and PH pipeline. (Dataset) (a) Schematics of a deeply knotted (top) and shallowly knotted (bottom) open curve. Knot cores tails are shown in purple and blue. (b) Example of a deeply knotted protein (PDB entry 3KZK) and a shallowly knotted one (PDB entry 4QEF). (c) The space of trefoil-knotted proteins plotted by chain length and knot depth. Each protein is coloured according to its sequence homology class. Note that there are deeply and shallowly knotted proteins of the same length, as well as distinct sequence homology classes exhibiting similar length and depth. (Pipeline) (d) The protein dataset is given by lists of three-dimensional coordinates of Cα atoms. For each protein, we generate the point cloud consisting of these points and linearly interpolated points between each successive Cα atom. (e) Persistence diagram derived from the 3KZK point cloud. The points represent one-dimensional features corresponding to loops, and their positions represent the lifetimes of these features: their coordinates are their birth and death scales. (f) Persistence landscape derived from 3KZK. (g) PH generators in homology degree one can be represented by PL cycles whose vertices are points in the point cloud. In red, an example of a local generator for a one-dimensional feature of the 3KZK point cloud.
Figure 2.
Figure 2.
Global analysis: the space of knotted protein structures. (a) Isomap embedding of the space of trefoil-knotted proteins equipped with the Wasserstein distance on persistence diagrams (see electronic supplementary material, section 2). Given a distance matrix, isomap produces a configuration, two dimensional in our case, such that the new distance between any two objects is preserved as much as possible. The embedding forms clusters corresponding to sequence homology classes. The embedding successfully clusters by depth category. (b) Isomap embedding of the space of trefoil-knotted proteins equipped with the distance on persistence landscape. For a definition of this distance, see electronic supplementary material, section 2. The embedding forms clusters corresponding to sequence homology classes. The embedding successfully clusters by depth category. (c) Average persistence landscapes generated from the sequence homology classes with representative (i) 6RQQ and (ii) 3ZNC. Although these classes are not separated in the isomap embeddings in (a,b), a randomization test confirms that the difference in their average landscapes is statistically significant (p ≈ 0.003).
Figure 3.
Figure 3.
Local analysis: geometry of homologous protein substructures. (a) Two homologous proteins, 3KZK (blue, knotted) and 4JQO (orange, unknotted), overlaid. These proteins have almost superimposable structures, but differ as knots by a crossing change localized within the red ellipse. The knot core in 3KZK and its corresponding structure in 4JQO are highlighted by showing the remaining parts in lighter shades of orange and blue. A close-up of the local configurations causing the topological change is shown in (a)(ii). A strand movement transforms the deeply knotted 3KZK into the unknotted 4JQO. (b) Average persistence landscapes generated from knotted (top) and unknotted (bottom) protein chains. The peak in λ2 (orange) centred at t ≈ 9 in the knotted case corresponds to a generator c for the PH of the knotted chains which does not arise in the PH of the unknotted chains. (c) The backbone of 3KZK. Violet segments indicate the knot core. The cycle representing the PH generator c is plotted in red and pink, where the pink segments show simplices in c that are not part of the 3KZK curve. Note that c is positioned close to the knot core and, more specifically, close to the crossings responsible for the non-trivial entanglement. Further, c intersects the arc that needs to be pushed to untangle the curve. (d) Heat map showing the distances between the λ2 landscapes for the proteins in the AOTCase and OTCase families. The two distinct purple squares demonstrate sufficient similarity in each class for the average λ2 landscapes to be faithful representatives for each class.

References

    1. Liang C, Mislow K. 1994. Knots in proteins. J. Am. Chem. Soc. 116, 11 189-11 190. ( 10.1021/ja00103a057) - DOI
    1. Mansfield ML. 1994. Are there knots in proteins? Nat. Struct. Biol. 1, 213-214. ( 10.1038/nsb0494-213) - DOI - PubMed
    1. Mansfield ML. 1997. Fit to be tied. Nat. Struct. Biol. 4, 166-167. ( 10.1038/nsb0397-166) - DOI - PubMed
    1. Taylor WR. 2000. A deeply knotted protein structure and how it might fold. Nature 406, 916-919. ( 10.1038/35022623) - DOI - PubMed
    1. Dabrowski-Tumanski P, Rubach P, Goundaroulis D, Dorier J, Sułkowski P, Millett KC, Rawdon EJ, Stasiak A, Sulkowska JI. 2019. KnotProt 2.0: a database of proteins with knots and other entangled structures. Nucleic Acids Res. 47, D367-D375. ( 10.1093/nar/gky1140) - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources