Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 13;65(1):427-434.
doi: 10.1021/acs.jcim.4c00427. Epub 2024 Nov 13.

A Probabilistic Approach in the Search Space of the Molecular Distance Geometry Problem

Affiliations

A Probabilistic Approach in the Search Space of the Molecular Distance Geometry Problem

Rômulo S Marques et al. J Chem Inf Model. .

Abstract

The discovery of the three-dimensional shape of protein molecules using interatomic distance information from nuclear magnetic resonance (NMR) can be modeled as a discretizable molecular distance geometry problem (DMDGP). Due to its combinatorial characteristics, the problem is conventionally solved in the literature as a depth-first search in a binary tree. In this work, we introduce a new search strategy, which we call frequency-based search (FBS), that for the first time utilizes geometric information contained in the protein data bank (PDB). We encode the geometric configurations of 14,382 molecules derived from NMR experiments present in the PDB into binary strings. The obtained results show that the sample space of the binary strings extracted from the PDB does not follow a uniform distribution. Furthermore, we compare the runtime of the symmetry-based build-Up (SBBU) algorithm (the most efficient method in the literature to solve the DMDGP) combined with FBS and the depth-first search (DFS) in finding a solution, ascertaining that FBS performs better in about 70% of the cases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Two points (green) are in the intersection of three spheres.
Figure 2
Figure 2
Binary tree associated with a DMDGP instance with six atoms.
Figure 3
Figure 3
Plane πi and the positions xi and formula image associated with orientations 0 and 1. The possible immersion points (in formula image) for the vertex vi are xi and formula image: the point “above” the plane πi is xi and has orientation bi = 1; the point “below” πi is formula image and has orientation bi = 0.
Figure 4
Figure 4
Order ρ of a 3-residue peptide backbone. The number inside each circle represents the position of the respective atom in the order of ρ. The repetition of formula image is the 10th element of ρ.
Figure 5
Figure 5
Binary tree nodes visited by DFS and FBS. The number in each node represents its position in the DFS visiting order. The number within a square at the bottom of each tree leaf indicates the position of the respective root–leaf path in the FBS. Similarly, the number within a triangle denotes the position of the path in DFS. The blue arcs highlight the root–leaf path corresponding to the solution. Blue squares and gray triangles illustrate which root–leaf paths are explored, respectively, by FBS and DFS until a solution is found (marked with *).
Figure 6
Figure 6
Accumulated probability of the occurrence of the protein binary sequences. For each type of independent pruning distance, each binary sequence configuration processed from PDB is mapped, in descending order of probabilities, to an index in the interval (0, 1]. The dashed line represents the accumulated probabilities of a uniform distribution.
Figure 7
Figure 7
Probability that the time speedup (the ratio of SBBU-DFS time to SBBU-FBS time) for binary sequences of different types of independent pruning distances is less than α, where α can go up to 100. If α is greater than 1, it means that FBS performs better than DFS.
Figure 8
Figure 8
Histogram of the total time speedup (the ratio of the time that SBBU-DFS spent to solve an instance completely to the respective SBBU-FBS time). The hatched area indicates the number of instances where FBS is better than DFS, which corresponds to 68.8% of all of the tested proteins.
Figure 9
Figure 9
Median relative time of each type of independent pruning distance when the problems are solved using DFS (blue) and FBS (in red). The relative time of a pruning distance type is the fraction of the total time spent solving all instances that corresponds to that type of distance. The “others” column corresponds to the relative time of all dependent pruning distances combined.

References

    1. Donald B. R.Algorithms in Structural Molecular Biology; MIT Press, 2011.
    1. Schlick T.Molecular modeling and simulation: an interdisciplinary guide; Springer, 2010.
    1. Lavor C.; Liberti L.; Maculan N.; et al. The discretizable molecular distance geometry problem. Comput. Optim. Appl. 2012, 52, 115–146. 10.1007/s10589-011-9402-6. - DOI - PubMed
    1. Lavor C.; Liberti L.; Maculan N. Recent advances on the discretizable molecular distance geometry problem. Eur. J. Oper. Res. 2012, 219, 698–706. 10.1016/j.ejor.2011.11.007. - DOI
    1. Mucherino A.; Lavor C.; Liberti L. The discretizable distance geometry problem. Opt. Lett. 2012, 6, 1671–1686. 10.1007/s11590-011-0358-3. - DOI - PubMed

LinkOut - more resources