Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 9:10:387.
doi: 10.3389/fgene.2019.00387. eCollection 2019.

Quality Control of Quantitative High Throughput Screening Data

Affiliations

Quality Control of Quantitative High Throughput Screening Data

Keith R Shockley et al. Front Genet. .

Abstract

Quantitative high throughput screening (qHTS) experiments can generate 1000s of concentration-response profiles to screen compounds for potentially adverse effects. However, potency estimates for a single compound can vary considerably in study designs incorporating multiple concentration-response profiles for each compound. We introduce an automated quality control procedure based on analysis of variance (ANOVA) to identify and filter out compounds with multiple cluster response patterns and improve potency estimation in qHTS assays. Our approach, called Cluster Analysis by Subgroups using ANOVA (CASANOVA), clusters compound-specific response patterns into statistically supported subgroups. Applying CASANOVA to 43 publicly available qHTS data sets, we found that only about 20% of compounds with response values outside of the noise band have single cluster responses. The error rates for incorrectly separating true clusters and incorrectly clumping disparate clusters were both less than 5% in extensive simulation studies. Simulation studies also showed that the bias and variance of concentration at half-maximal response (AC50 ) estimates were usually within 10-fold when using a weighted average approach for potency estimation. In short, CASANOVA effectively sorts out compounds with "inconsistent" response patterns and produces trustworthy AC50 values.

Keywords: ANOVA; clustering; concentration-response; potency; quantitative high throughput screening; toxicological response.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Three separate cases are represented by concentration-response data from the BG1 estrogen receptor agonist assay from phase II of the Tox21 collaboration (tox21-er-luc-bg1-4e2-agonist-p2). Responses are shown as a percentage of the assay positive control values after correction by DMSO negative controls (Inglese et al., 2006). The assay detection limits are indicated with dashed lines. An AC50 value from the Hill model, calculated using the weighted average approach, summarizes the potency of each cluster (see section “Materials and Methods”). (A) Case 1 shows 12 similar response profiles from oxymetholone which extend beyond noise and group together into a single cluster. This case corresponds to two different supplier designations, two library preparation sites and two purities (A and D, representing “good” and “poor” purity, respectively) generated on six different experimental days. (B) Case 2 shows nine responses from hydrochlorothiazide which all lie within the noise band and correspond to three supplier sources, three library preparation sites, and a single purity (A) generated in six different experimental days. (C) Case 3 is represented by 42 response profiles from 2,3,5,6-tetrachloronitrobenzene corresponding to one supplier, three library preparation sites, one purity designation (A) and seven experimental days. A total of 29 of the 42 repeats lie within the noise band (shown in gray), and other profiles cluster by our proposed methodology CASANOVA described in this paper into the three disparate groups of 9, 3, and 1 repeats shown in black, green, and red, respectively. The separation of clusters in Case 3 is not explained by library preparation site or experimental day.
FIGURE 2
FIGURE 2
A barplot was used to summarize the response patterns corresponding to 72 assay readouts from 43 different data sets. A total of 7,229 chemicals were common among all 43 data sets. In the barplot, the gray regions correspond to the fraction of chemicals clustered in the noise band (Case 2), the dark green regions refer to a single detectable cluster well-separated from the noise band (Conclusive Case 1), the light green regions represent a single cluster with response points not statistically separable from noise (Inconclusive Case 1), the pink regions correspond to multiple clusters with response points not statistically separable from the noise band (Inconclusive Case 3) and the red regions refer to multiple clusters well-separated from the noise band (Conclusive Case 3). Agonist assay labels are shown in dark blue, antagonist/inhibitor assay labels are shown in green and viability assay labels are shown in gray. Selected compound profiles from assays with multiple clusters (Conclusive Case 3) are shown to the right of the barplot. Known factors associated with different clusters are indicated in the upper left of each plot. These factors include supplier, library preparation site, concentration spacing, compound purity and experimental day. None of these factors explain the different patterns observed in the last two plots. Hence, adjusting or normalizing the concentration-response data for these known factors will not necessarily eliminate multiple cluster response patterns among repeats within a compound in qHTS data.
FIGURE 3
FIGURE 3
Complementary empirical cumulative distribution (CCDF) describing the variability in AC50 values. The maximal range of AC50 values (on the log10 scale) was calculated for each compound in which two or more clusters were identified outside of the noise region for each of the 7,729 compounds investigated in the 43 data sets described in the text. The order of magnitude differences in intrachemical potency estimates shown here represent only those cases in which the calculated AC50 is between 10-5 and 1000 μM, which covers the typical testing concentration range of ∼10-4 to 100 μM evaluated in these assays. The number of compounds meeting this criterion ranged from 42 to 774 in the 72 assay types evaluated here, with a median of 255 compounds. (A) The CCDF (or 1-CDF) plots describing the proportion of compounds (y-axis) for a given spread in AC50 (x-axis) in the tox21-er-luc-bg1-4e2-antagonist-p1 viability assay (blue) and the tox21-gh3-tre-agonist-p1 agonist assay (red) are displayed. The vertical black lines indicate 10- and 100-fold differences in the calculated range of AC50 values. (B) The CCDF for the fraction of the 72 assays with greater than 10-fold range in AC50 values (y-axis) for a given spread in AC50 (x-axis) are shown for the agonist (dark blue), antagonist/inhibitor (dark green) and viability (dark gray) assays. (C) The CCDF for the fraction of the 72 assays with greater than 100-fold range in AC50 are shown for the same agonist, antagonist/inhibitor and viability assays presented in (B).

References

    1. Abdo N., Xia M., Brown C. C., Kosyk O., Huang R., Sakamuru S., et al. (2015). Population-based in vitro hazard and concentration-response assessment of chemicals: the 1000 genomes high-throughput screening study. Environ. Health Pers. 123 458–466. 10.1289/ehp.1408775 - DOI - PMC - PubMed
    1. Anthony Tony Cox L., Popken D. A., Kaplan A. M., Plunkett L. M., Becker R. A. (2016). How well can in vitro data predict in vivo effects of chemicals? Rodent carcinogenicity as a case study. Regul. Toxicol. Pharmacol. 77 54–64. 10.1016/j.yrtph.2016.02.005 - DOI - PubMed
    1. Attene-Ramos M. S., Miller N., Huang R., Michael S., Itkin M., Kavlock R. J., et al. (2013). The Tox21 robotic platform for the assessment of environmental chemicals–from vision to reality. Drug Disc. Today 18 716–723. 10.1016/j.drudis.2013.05.015 - DOI - PMC - PubMed
    1. Barretina J., Caponigro G., Stransky N., Venkatesan K., Margolin A. A., Kim S., et al. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 603–607. 10.1038/nature11003 - DOI - PMC - PubMed
    1. Bouhaddou M., DiStefano M. S., Riesel E. A., Carrasco E., Holzapfel H. Y., Jones D. C., et al. (2016). Drug response consistency in CCLE and CGP. Nature 540 E9–E10. - PMC - PubMed

LinkOut - more resources