Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 26;116(9):3373-3378.
doi: 10.1073/pnas.1810847116. Epub 2019 Feb 11.

Ligand biological activity predicted by cleaning positive and negative chemical correlations

Affiliations

Ligand biological activity predicted by cleaning positive and negative chemical correlations

Alpha A Lee et al. Proc Natl Acad Sci U S A. .

Abstract

Predicting ligand biological activity is a key challenge in drug discovery. Ligand-based statistical approaches are often hampered by noise due to undersampling: The number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We derive a statistical framework inspired by random matrix theory and combine the framework with high-quality negative data to discover important chemical differences between active and inactive molecules by disentangling undersampling noise. Our model outperforms standard benchmarks when tested against a set of challenging retrospective tests. We prospectively apply our model to the human muscarinic acetylcholine receptor M1, finding four experimentally confirmed agonists that are chemically dissimilar to all known ligands. The hit rate of our model is significantly higher than the state of the art. Our model can be interpreted and visualized to offer chemical insights about the molecular motifs that are synergistic or antagonistic to M1 agonism, which we have prospectively experimentally verified.

Keywords: bioactivity prediction; chemoinformatics; ligand-based drug discovery; machine learning; random matrix theory.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: Q.Y., A.B., C.R.B., X.H., S.J., and D.A.P. are current employees of Pfizer. The structure highlighted in red in Fig. 3A is exemplified in Patent WO/2013/072705.

Figures

Fig. 1.
Fig. 1.
The eigenvalue distribution of random molecules drawn from ChEMBL follows the random matrix distribution. The histogram shows the eigenvalue distribution of 200 random molecules drawn from ChEMBL, and the red curve is the random matrix distribution (Eq. 2) for p=1024 and N=200.
Fig. 2.
Fig. 2.
Our random matrix model captures the statistics of M1 agonists and confirmed inactives from a historic campaign. The random matrix distribution (red curve) agrees with the histograms of eigenvalues of the (A) active agonists and (B) confirmed inactives. (C) A classification model built using the statistically significant eigenvectors achieves an accuracy of 98%.
Fig. 3.
Fig. 3.
The RMD model discovered four human M1 agonists, compounds AD. (Upper) The measured dose–response curves for the agonists. (Lower) The molecular structures of the agonists (AD); Insets show the closest molecule in the training set by Tanimoto coefficient.
Fig. 4.
Fig. 4.
Our model can be interpreted as a network of features, where each feature is an entry of the molecular fingerprint. The opacity of the nodes is proportional to the difference in the number of times that the feature is present in the active set relative to the inactive set; only the top 10 features are shown. Red (blue) edges correspond to a positive (negative) correlation, and the width of the edges is proportional to the strength of the correlation.
Fig. 5.
Fig. 5.
Prospective matched molecular pair analysis corroborates the significant negative correlation between the piperazine and the aromatic nitrogen motif that the model predicts.

References

    1. Alvarez J, Shoichet B, editors. Virtual Screening in Drug Discovery. CRC Press; Boca Raton, FL: 2005.
    1. Klebe G. Virtual ligand screening: Strategies, perspectives and limitations. Drug Discov Today. 2006;11:580–594. - PMC - PubMed
    1. Kubinyi H, et al. Virtual Screening for Bioactive Molecules. Vol. 10 John Wiley & Sons; Weinheim, Germany: 2008.
    1. Koeppen H. Virtual screening-what does it give us? Curr Opin Drug Discov Dev. 2009;12:397–407. - PubMed
    1. Schneider G. Virtual screening: An endless staircase? Nat Rev Drug Discov. 2010;9:273–276. - PubMed

Publication types

MeSH terms