Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 17;17(1):97.
doi: 10.1186/s13321-025-00958-w.

Crossover operators for molecular graphs with an application to virtual drug screening

Affiliations

Crossover operators for molecular graphs with an application to virtual drug screening

Nico Domschke et al. J Cheminform. .

Abstract

Genetic algorithms are a powerful method to solve optimization problems with complex cost functions over vast search spaces that rely in particular on recombining parts of previous solutions. Crossover operators play a crucial role in this context. Here, we describe a large class of these operators designed for searching over spaces of graphs. These operators are based on introducing small cuts into graphs and rejoining the resulting induced subgraphs of two parents. This form of cut-and-join crossover can be restricted in a consistent way to preserve local properties such as vertex-degrees (valency), or bond-orders, as well as global properties such as graph-theoretic planarity. In contrast to crossover on strings, cut-and-join crossover on graphs is powerful enough to ergodically explore chemical space even in the absence of mutation operators. Extensive benchmarking shows that the offspring of molecular graphs are again plausible molecules with high probability, while at the same time crossover drastically increases the diversity compared to initial molecule libraries. Moreover, desirable properties such as favorable indices of synthesizability are preserved with sufficient frequency that candidate offsprings can be filtered efficiently for such properties. As an application we utilized the cut-and-join crossover in REvoLd, a GA-based system for computer-aided drug design. In optimization runs searching for ligands binding to four different target proteins we consistently found candidate molecules with binding constants exceeding the best known binders as well as candidates found in make-on-demand libraries.Scientific contributionWe define cut-and-join crossover operators on a variety of graph classes including molecular graphs. This constitutes a mathematically simple and well-characterized approach to recombination of molecules that performed very well in real-life CADD tasks.

Keywords: Crossover; Genetic algorithm; Graph theory; Virtual screening.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no Competing interests.

Figures

Fig. 1
Fig. 1
An example of a 2-cut on naphthalene. The application of the cut set C(V,V) leads to the formation of the shown fragments, i.e the induced subgraphs G[V] and G[V]
Fig. 2
Fig. 2
Lewis structure (left) and molecule multigraph (right) of allyl alcohol
Fig. 3
Fig. 3
Increasingly restrictive join operators. While a join allows any nodes of two subgraphs to be linked, for a cut node preserving join only connections between nodes involved in the cut sets of their respective parental graphs are permitted. A degree preserving join further requires all nodes of the offspring graph to adhere to the degree of their respective parental vertices. The most restrictive join presented is the natural join, which introduces edge weights, e.g. bond orders, that must be conserved
Fig. 4
Fig. 4
Choosing V and W as the top vertices of the two triangles yields two cuts consisting of two edges, each with weight 1. The only possible join in this case is a multigraph. This join is degree-preserving, as each of the two vertices still has two incident edges, but is not natural since it converted (parts of) a simple graph into a multigraph
Fig. 5
Fig. 5
Cut-and-join crossover of two planar graphs can be restricted to yield again a planar graph. Since every minimal cut of a planar graph is cycle in its complement, there is a circular order to the severed edges and thus of the vertices incident with the cutset. It sufficies to preserve the circular order of these vertices (in arbitrary orientation). The example here is in addition degree-preserving
Fig. 6
Fig. 6
Left: Search-tree of the cut enumeration algorithm. Right: The different steps of the cut enumeration algorithm shown on camphor. Enumeration of a subtree can be halted, if a cut results in > 2 connected components or the cut node set has already been visited (colored red)
Fig. 7
Fig. 7
Left: Functional groups can be interconverted by a simple cut-and-join crossover. The general position (primary, secondary, tertary, quaternary) of a functional group can be changed by means of crossover operations that replace a hydrogen atom by a larger fragment (dotted arrows). Right: Cut-and-join crossover involving hydrogens enable access to simple reactions
Fig. 8
Fig. 8
Within a minimal polycyclic set, bridged polycycles sharing multiple edges are accessible as well as cubanes
Fig. 9
Fig. 9
A minimal set of base graphs B needed to access all graphs of degree at most 4 using a sequence of natural join operators
Fig. 10
Fig. 10
Diversity of crossover products taken from unrestricted cut-and-join operation on a sample of 10 molecules from the USPTO-10k. The homogeneous distribution of the recombinants in the t-SNE embedding shows that offspring molecules cover a high molecular diversity
Fig. 11
Fig. 11
Synthesizability scores calculated by SAScore and SCScore on the 10-sample molecule recombinant set in comparison to the USPTO-10k dataset. Both tools show an increase in the difficulty of synthesizablility. The average ΔSA Score is listed in Additional file 4
Fig. 12
Fig. 12
Frequencies of molecules with a better SAscore than the nth percentile of USPTO-10k molecules are depicted. A significant fraction of the recombinant offsprings has good SAscores, and thus it is feasible in each step of the GA to reject candidate offsprings that are difficult to produce according to their SA score
Fig. 13
Fig. 13
Molecular weight (MW) distribution of the 10 molecule sample set. The cut-and-join crossover operator shows an increase of molecular weight. The ΔMW average of the different sets compared to the USPTO-10k can be found in Additional file 4. For better readability 56 molecules with a MW >1100 in the USPTO-10k were removed
Fig. 14
Fig. 14
Distribution of ring topologies of parent and recombinant molecules. Molecules are assigned a category based on the priority: heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The cut-and-join crossover operator tends to generate more polycycles, especially with heteroaromatic and aromatic rings. This is also due to the sample, which is also largely composed of heteroaromatics
Fig. 15
Fig. 15
Distribution of normalized docking scores for four protein targets using either only known actives, random samples from the Enamine REAL space, REvoLd exploring the Enamine REAL space, or our custom chemical space. Dotted lines are docking scores for the co-crystallized ligands from the used protein structure. Next to each violin plot are first the highest scoring known active and second the highest scoring molecule from running REvoLd on our custom chemical space. Lid_root2 is the an interface score normalized by the squared number of non-hydrogen atoms. All scoring results except EFR+REvoLd were taken from the original REvoLd publication
Fig. 16
Fig. 16
Illustration of aef in matrix representation for different joins to demonstrate the necessity of the embedded constraints. The first two constraints only allow one connection per row and column, which is infringed in the first example by e3 and f3, which both give rise to two newly formed edges. The third constraint aligns the join to impede multi-graph formation, as it is depicted in the second example. The fourth constraint ensures that the edge labels of introduced edges adhere to the definition of a weight preserving join, violated in the forth example

Similar articles

References

    1. Cayley A (1874) On the mathematical theory of isomers. Phil Mag 47:444–446
    1. Estrada E (2011) The Structure of Complex Networks: Theory and Applications. Oxford University Press, Oxford, UK
    1. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimedia Tools Appl 80:8091–8126. 10.1007/s11042-020-10139-6 - PMC - PubMed
    1. Khan IH (2015) Assessing different crossover operators for Travelling Salesman Problem. Intl J Intelligent Systems and Applications 11:19–25. 10.5815/ijisa.2015.11.03
    1. McInnes C (2007) Virtual screening strategies in drug discovery. Current Opinion in Chemical Biology 11(5):494–502. 10.1016/j.cbpa.2007.08.033 - PubMed

LinkOut - more resources