Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 18;9(26):28691-28706.
doi: 10.1021/acsomega.4c02886. eCollection 2024 Jul 2.

Predictions of Colloidal Molecular Aggregation Using AI/ML Models

Affiliations

Predictions of Colloidal Molecular Aggregation Using AI/ML Models

David C Kombo et al. ACS Omega. .

Abstract

To facilitate the triage of hits from small molecule screens, we have used various AI/ML techniques and experimentally observed data sets to build models aimed at predicting colloidal aggregation of small organic molecules in aqueous solution. We have found that Naïve Bayesian and deep neural networks outperform logistic regression, recursive partitioning tree, support vector machine, and random forest techniques by having the lowest balanced error rate (BER) for the test set. Derived predictive classification models consistently and successfully discriminated aggregator molecules from nonaggregator hits. An analysis of molecular descriptors in favor of colloidal aggregation confirms previous observations (hydrophobicity, molecular weight, and solubility) in addition to undescribed molecular descriptors such as the fraction of sp3 carbon atoms (Fsp3), and electrotopological state of hydroxyl groups (ES_Sum_sOH). Naïve Bayesian modeling and scaffold tree analysis have revealed chemical features/scaffolds contributing the most to colloidal aggregation and nonaggregation, respectively. These results highlight the importance of scaffolds with high Fsp3 values in promoting nonaggregation. Matched molecular pair analysis (MMPA) has also deciphered context-dependent substitutions, which can be used to design nonaggregator molecules. We found that most matched molecular pairs have a neutral effect on aggregation propensity. We have prospectively applied our predictive models to assist in chemical library triage for optimal plate selection diversity and purchase for high throughput screening (HTS) in drug discovery projects.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Molecular descriptors important for colloidal aggregation, as derived from the random forest classification model. (A) Only the top-30 descriptors are shown. (B) After removal of highly correlated descriptors. (C) Pearson correlation heatmap of the top important descriptors shown in panel (B).
Figure 2
Figure 2
3D plots of PCA components for the whole data set. Aggregators and nonaggregator molecules are colored red and blue, respectively. Data points are shown as filled cubes. The larger the latter, the higher the Fsp3 value of the compound.
Figure 3
Figure 3
Examples of compounds that were part of the validation set. These compounds were purchased from Mcule, Emolecules, and Chembridge. Only compound 7 was an experimentally observed aggregator. All the remaining compounds were nonaggregators.
Figure 4
Figure 4
Chemical features are shown with their score, as derived from Naïve Bayesian. (A) Top features favorable for nonaggregation. (B) Top features promoting colloidal aggregation.
Figure 5
Figure 5
Scaffold Tree and Bemis–Murcko scaffolds derived from the current data set. Panels (A) and (B): Distribution of the number of levels in the scaffold trees for (A) nonaggregators and (B) colloidal aggregators. Panels (C) through (E): Scaffold Tree (root level and level 1) and Bemis–Murcko scaffolds most frequently common to (C) colloidal aggregators only, (D) nonaggregators only, and (E) both nonaggregators and aggregator molecules. (F) Box plot of Fsp3 for scaffolds frequently observed in aggregators only, nonaggregators only, and in both classes of molecules. (D) shows that the scaffolds most frequently present in nonaggregators are sp3 carbon atom-rich molecules. This observation is corroborated by (F), which shows the trend in Fsp3 values for each group of scaffolds. Results are consistent with the increased number of sp3 carbon atoms in the nonaggregator molecules as compared to aggregators, as can also be seen from the bigger size of their filled-cube data points shown in Figure 2, Figure 4A as compared to Figure 4B, and (D) in comparison to (C). Examination of (E) suggests that most used aromatic rings such as phenyl, pyridine, pyrimidine, and indole are frequently found in both classes of molecules. Interestingly, cyclohexyl and piperidine scaffolds, which are commonly used aliphatic bioiosteres for phenyl and pyridine, are also listed among the most frequent scaffolds present in both aggregators and nonaggregators. In our data set, we found 249 scaffolds specific to aggregators, 327 scaffolds specific to nonaggregators, and 135 nonspecific scaffolds (i.e., common to both classes of molecules), with frequency >10.
Figure 5
Figure 5
Scaffold Tree and Bemis–Murcko scaffolds derived from the current data set. Panels (A) and (B): Distribution of the number of levels in the scaffold trees for (A) nonaggregators and (B) colloidal aggregators. Panels (C) through (E): Scaffold Tree (root level and level 1) and Bemis–Murcko scaffolds most frequently common to (C) colloidal aggregators only, (D) nonaggregators only, and (E) both nonaggregators and aggregator molecules. (F) Box plot of Fsp3 for scaffolds frequently observed in aggregators only, nonaggregators only, and in both classes of molecules. (D) shows that the scaffolds most frequently present in nonaggregators are sp3 carbon atom-rich molecules. This observation is corroborated by (F), which shows the trend in Fsp3 values for each group of scaffolds. Results are consistent with the increased number of sp3 carbon atoms in the nonaggregator molecules as compared to aggregators, as can also be seen from the bigger size of their filled-cube data points shown in Figure 2, Figure 4A as compared to Figure 4B, and (D) in comparison to (C). Examination of (E) suggests that most used aromatic rings such as phenyl, pyridine, pyrimidine, and indole are frequently found in both classes of molecules. Interestingly, cyclohexyl and piperidine scaffolds, which are commonly used aliphatic bioiosteres for phenyl and pyridine, are also listed among the most frequent scaffolds present in both aggregators and nonaggregators. In our data set, we found 249 scaffolds specific to aggregators, 327 scaffolds specific to nonaggregators, and 135 nonspecific scaffolds (i.e., common to both classes of molecules), with frequency >10.
Figure 6
Figure 6
Examples of MMPs that increase the propensity for molecular aggregation. For each of the 12 triplets of fragments, the common core is the leftmost, followed by R groups promoting nonaggregation and those promoting aggregation in the context of the given scaffold, respectively. MMPs involving the phenyl thiazole amide scaffold and the benzylamide thiadiazole core are shown embedded in a green rounded rectangle and a blue rounded rectangle, respectively.
Figure 7
Figure 7
Drug-likeness property distributions and receiver-operated characteristic (ROC) curves. (A) Library A data set property distribution. (B) Library B data set property distribution. (C) ROC curve for the drug-likeness model training set. (D) ROC curve for drug likeness model test set.
Figure 8
Figure 8
Box plot of Fsp3 for nonaggregators vs aggregators, using the training set.
Figure 9
Figure 9
Box plots of molecular properties of MMPs derived between nonaggregators (activity = 0) and colloidal aggregators (activity = 1). (A) Case of the sum of the numbers of hydrogen-bond donors and hydrogen-bond acceptors. (B) Case of the number of aromatic rings.

References

    1. Duan D.; Torosyan H.; Elnatan D.; McLaughlin C. K.; Logie J.; Shoichet M. S.; Agard D. A.; Shoichet B. K. Internal Structure and Preferential Protein Binding of Colloidal Aggregates. ACS Chem. Biol. 2017, 12, 282–290. 10.1021/acschembio.6b00791. - DOI - PMC - PubMed
    1. McLaughlin C. K.; Duan D.; Ganesh A. N.; Torosyan H.; Shoichet B. K.; Shoichet M. S. Stable Colloidal Drug Aggregates Catch and Release Active Enzymes. ACS Chem. Biol. 2016, 11, 992–1000. 10.1021/acschembio.5b00806. - DOI - PMC - PubMed
    1. Owen S. C.; Doak A. K.; Wassam P.; Shoichet M. S.; Shoichet B. K. Colloidal aggregation affects the efficacy of anticancer drugs in cell culture. ACS Chem. Biol. 2012, 7, 1429–1435. 10.1021/cb300189b. - DOI - PMC - PubMed
    1. Dotolo S.; Marabotti A.; Facchiano A.; Tagliaferri R. A review on drug repurposing applicable to COVID-19. Brief Bioinform. 2021, 22, 726–741. 10.1093/bib/bbaa288. - DOI - PMC - PubMed
    1. Hossain M. S.; Hami I.; Sawrav M. S. S.; Rabbi M. F.; Saha O.; Bahadur N. M.; Rahaman M. M. Drug Repurposing for Prevention and Treatment of COVID-19: A Clinical Landscape. Discoveries 2020, 8, e12110.15190/d.2020.18. - DOI - PMC - PubMed

LinkOut - more resources