Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Mar 1;19(2):277-285.
doi: 10.1093/bib/bbw105.

Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening

Affiliations
Review

Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening

Shardul Paricharak et al. Brief Bioinform. .

Abstract

High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics. This is evidenced by increased hit rates in iterative screening strategies and novel insights into compound mode of action obtained through activity modeling. Here, we provide an overview of the developments in data-driven approaches, elaborate on novel activity modeling techniques and screening paradigms explored and outline their significance in HTS.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diverse libraries compared with focused libraries. Structurally diverse libraries are used to efficiently explore relevant chemical space for targets with few known active chemotypes or for phenotypic assays [34] (A). This is performed to provide multiple starting points for further development. Example structures were taken from the ZINC lead compounds library [35], and PAINS [36] were omitted. Owing to the diversity of the compounds tested, a wide range of activities can be observed: from inactive (blue) through somewhat active (yellow) and moderately active (orange) to highly active (red). By contrast, focused libraries are often designed for targets with many known active chemotypes, such as GPCRs, kinases and, in some cases, ion channels (B). Here, example structures were taken from Harris et al. [37] and Fernández-de Gortari and Medina-Franco [38], and PAINS [36] were omitted. These libraries focus around active chemotypes found previously, for instance, through diversity-based screening [2, 37, 39, 40]. Here, analogs often exhibit fewer differences in activity, as the presence of many more similar compounds will more likely result in multiple actives compared with diverse libraries.
Figure 2
Figure 2
Graphical representation of the differences between systematic and random errors. Systematic errors are associated with consistent over- or underestimated activity across the screening collection. By contrast, while random errors are usually caused by noise and have a low impact on the overall results, they do not present any pattern, which makes their identification more difficult (A). We show an example of systematic error in the McMaster University experimental HTS assay [57] (B). Here, the number of hits in each well across 1250 plates is shown. In general, wells located in rows A and B presented a higher hit rate than those at the center of the plates, exemplifying how the well position can be associated with a systematic error. Systematic errors can be detected using the Student’s t-test [56], for example (C). Here, measurements from one row or column (Sample 1) are compared with those of the remainder of the plate (Sample 2). When mean hit values of Sample 1 are significantly different from mean values of Sample 2, a systematic error is detected.
Figure 3
Figure 3
Overview of recent studies improving (scaffold) hit rates and providing insights into compound mode of action. Describing compound bioactivity across ∼200 assays at Novartis, Petrone et al. [25] took the concept of bioactivity-based similarity to an unparalleled level. Here, biological analogs of hits were prioritized for testing (A). Later studies leveraged bioactivity profiles of structural analogs of poorly characterized compounds to select subsets of compounds for virtual screening [24] (B), or used a screening strategy using biological and chemical similarity metrics in parallel to iteratively expand around hits from multiple rounds of screening [3] (C). Further improvements resulted from changes in experimental design strategy [93], machine learning methods for predicting actives [23] and informer sets for routine exploratory screening [94] (D). Other studies used bioactivity-based similarity searching for mode-of-action analyses at Novartis [91], Roche [92] and in the public domain [26] (E).

References

    1. Drews J. Drug discovery: a historical perspective. Science 2000;287:1960–4. - PubMed
    1. Macarron R. Critical review of the role of HTS in drug discovery. Drug Discov Today 2006;11:277–9. - PubMed
    1. Paricharak S, IJzerman AP, Bender A, et al. Analysis of iterative screening with stepwise compound selection based on Novartis in-house HTS data. ACS Chem Biol 2016;11:1255–64. - PubMed
    1. Mayr LM, Fuerst P.. The future of high-throughput screening. J Biomol Screen 2008;13:443–8. - PubMed
    1. Mayr LM, Bojanic D.. Novel trends in high-throughput screening. Curr Opin Pharmacol 2009;9:580–8. - PubMed

Publication types

Substances