. 2012;7(11):e48476.

doi: 10.1371/journal.pone.0048476. Epub 2012 Nov 21.

Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery

Peter S Kutchukian¹, Nadya Y Vasilyeva, Jordan Xu, Mika K Lindvall, Michael P Dillon, Meir Glick, John D Coley, Natasja Brooijmans

Affiliations

PMID: 23185259
PMCID: PMC3504051
DOI: 10.1371/journal.pone.0048476

Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery

Peter S Kutchukian et al. PLoS One. 2012.

. 2012;7(11):e48476.

doi: 10.1371/journal.pone.0048476. Epub 2012 Nov 21.

Authors

Peter S Kutchukian¹, Nadya Y Vasilyeva, Jordan Xu, Mika K Lindvall, Michael P Dillon, Meir Glick, John D Coley, Natasja Brooijmans

Affiliation

¹ Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Cambridge, MA, USA.

PMID: 23185259
PMCID: PMC3504051
DOI: 10.1371/journal.pone.0048476

Abstract

Medicinal chemists' "intuition" is critical for success in modern drug discovery. Early in the discovery process, chemists select a subset of compounds for further research, often from many viable candidates. These decisions determine the success of a discovery campaign, and ultimately what kind of drugs are developed and marketed to the public. Surprisingly little is known about the cognitive aspects of chemists' decision-making when they prioritize compounds. We investigate 1) how and to what extent chemists simplify the problem of identifying promising compounds, 2) whether chemists agree with each other about the criteria used for such decisions, and 3) how accurately chemists report the criteria they use for these decisions. Chemists were surveyed and asked to select chemical fragments that they would be willing to develop into a lead compound from a set of ~4,000 available fragments. Based on each chemist's selections, computational classifiers were built to model each chemist's selection strategy. Results suggest that chemists greatly simplified the problem, typically using only 1-2 of many possible parameters when making their selections. Although chemists tended to use the same parameters to select compounds, differing value preferences for these parameters led to an overall lack of consensus in compound selections. Moreover, what little agreement there was among the chemists was largely in what fragments were undesirable. Furthermore, chemists were often unaware of the parameters (such as compound size) which were statistically significant in their selections, and overestimated the number of parameters they employed. A critical evaluation of the problem space faced by medicinal chemists and cognitive models of categorization were especially useful in understanding the low consensus between chemists.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: P.S.K., J.X., M.K.L., M.G., and M.P.D. are employed by Novartis Institutes for BioMedical Research. N.B. is employed by Blueprint Medicines. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.

Figures

**Figure 1. Predictive accuracy of Semi-Naïve Bayesian (SNB) and Random Forest (RF) classifiers trained on medicinal chemists’ selections.**
The average ROCS score for a 4-fold cross validation of each classifier is reported. A: SNB classifier built with medicinal chemistry relevant descriptors (red) is compared to a benchmark Naïve-Bayesian classifier that uses extended connectivity fingerprints and physical chemical properties as descriptors (black). B: RF classifier built with medicinal chemistry relevant descriptors (blue) is compared to a benchmark RF classifier that uses extended connectivity fingerprints and physical chemical properties as descriptors (black).

**Figure 2. The parameters extracted from the SNB (red) and RF (blue) classifiers are compared with parameters designated as important in chemists’ self-reports (grey).**
The primary parameters for the classifiers are depicted as stars, and the secondary parameters are depicted as circles. The one-tailed Fisher exact probability test (p) is reported for each parameter (except chains and charge), indicating that the SNB and RF parameters show agreement with each other, while the self reported parameters are independent of either of the classifier’s parameters.

**Figure 3. Examples of selection preferences based on simple physicochemical properties, and the corresponding SNB classifiers.**
A: Histogram of number of atoms of fragments selected by chemist 3 as good (green) or bad (red) starting points for drug discovery campaigns. Frequencies are normalized by the total number of selected or unselected compounds, respectively. B: Bayesian score versus number of atoms for minimal Bayesian model build for chemist 3. A positive score indicates a favorable number of atoms, while a negative score indicates an unfavorable number of atoms. C: Histogram of molecular polar surface area of fragments selected by chemist 12 as good (green) or bad (red) starting points for drug discovery campaigns. Frequencies are normalized by the total number of selected or unselected compounds, respectively. D: Bayesian score versus molecular polar surface area bins for SNB classifier built for chemist 12.

**Figure 4. The SNB classifier built using a descriptor subsumed by the functional group parameter is illustrated for chemist 1.**
Keys that represent the presence (black) or absence (white) of chemical substructures are ordered from negative (bad) on the left to positive (good) values on the right (A). The worst and best substructure keys are zoomed in on (B). Specific chemical substructures (tertiary amine – blue, aromatic heteroatom – violet, hydroxyl – aqua, and carboxylic acid - orange) are highlighted for one of the worst keys and two of the best keys, and illustrative examples of fragments that would be described by these keys are depicted (C).

**Figure 5. Ring topology SNB classifier comparison between chemists.**
The most favorable and unfavorable keys for the RingBonds_AromaticBonds_RingAssemblies (RB_AB_RA ) descriptor model, which measures the number of ring bonds (RB), aromatic bonds (AB), and ring assemblies (RA) present in a compound, were examined. Representative scaffolds that correspond to these keys are depicted, and are clustered based on how chemists viewed them. The Bayes score for each models built on individual chemists for each key is reported in a heat map. The favorable keys receive a positive score, while unfavorable keys receive a negative score.

**Figure 6. The selection characteristics of chemists with high estimated consensus.**
The cultural consensus model was applied to a subset of fragments (311) with >75% agreement by chemists. The estimated consensus obtained by this method is plotted against the fraction of fragments passed by chemists for the entire survey. Each shape describes the primary SNB parameter used to reproduce chemists’ selections, and the color depicts the ROC score of naïve Bayesian classifiers built using ECFP4 as a descriptor for each chemist. A subset of high consensus chemists is above the dashed grey line.

See this image and copyright information in PMC

Cited by

Molecule auto-correction to facilitate molecular design.
Kerstjens A, De Winter H. Kerstjens A, et al. J Comput Aided Mol Des. 2024 Feb 16;38(1):10. doi: 10.1007/s10822-024-00549-1. J Comput Aided Mol Des. 2024. PMID: 38363377 Free PMC article.
Molecular Similarity Perception Based on Machine-Learning Models.
Gandini E, Marcou G, Bonachera F, Varnek A, Pieraccini S, Sironi M. Gandini E, et al. Int J Mol Sci. 2022 May 30;23(11):6114. doi: 10.3390/ijms23116114. Int J Mol Sci. 2022. PMID: 35682792 Free PMC article.
Extracting medicinal chemistry intuition via preference machine learning.
Choung OH, Vianello R, Segler M, Stiefl N, Jiménez-Luna J. Choung OH, et al. Nat Commun. 2023 Oct 31;14(1):6651. doi: 10.1038/s41467-023-42242-1. Nat Commun. 2023. PMID: 37907461 Free PMC article.
Impact of Applicability Domains to Generative Artificial Intelligence.
Langevin M, Grebner C, Güssregen S, Sauer S, Li Y, Matter H, Bianciotto M. Langevin M, et al. ACS Omega. 2023 Jun 12;8(25):23148-23167. doi: 10.1021/acsomega.3c00883. eCollection 2023 Jun 27. ACS Omega. 2023. PMID: 37396211 Free PMC article.
The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation.
Franco P, Porta N, Holliday JD, Willett P. Franco P, et al. J Cheminform. 2014 Feb 1;6(1):5. doi: 10.1186/1758-2946-6-5. J Cheminform. 2014. PMID: 24485002 Free PMC article.

See all "Cited by" articles

References

1. Lombardino JG, Lowe JA 3rd (2004) The role of the medicinal chemist in drug discovery–then and now. Nat Rev Drug Discov 3: 853–862. - PubMed
1. Davies JW, Glick M, Jenkins JL (2006) Streamlining lead discovery by aligning in silico and high-throughput screening. Current Opinion in Chemical Biology 10: 343–351. - PubMed
1. Breiman L (1996) Bagging predictors. Machine Learning 24: 123–140.
1. Tversky A, Kahneman D (1973) Availability - Heuristic for Judging Frequency and Probability. Cognitive Psychology 5: 207–232.
1. Tversky A, Kahneman D (1974) Judgment under Uncertainty - Heuristics and Biases. Science 185: 1124–1131. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery

Affiliation

Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical