Quality Control of Quantitative High Throughput Screening Data

Keith R Shockley¹, Shuva Gupta², Shawn F Harris³, Soumendra N Lahiri⁴, Shyamal D Peddada⁵

Affiliations

¹ Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, United States.
² Statistics Department, University of Pennsylvania, Philadelphia, PA, United States.
³ Social and Scientific Systems, Durham, NC, United States.
⁴ Department of Statistics, North Carolina State University, Raleigh, NC, United States.
⁵ Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States.

PMID: 31143201
PMCID: PMC6520559
DOI: 10.3389/fgene.2019.00387

Quality Control of Quantitative High Throughput Screening Data

Keith R Shockley et al. Front Genet. 2019.

. 2019 May 9:10:387.

doi: 10.3389/fgene.2019.00387. eCollection 2019.

Authors

Keith R Shockley¹, Shuva Gupta², Shawn F Harris³, Soumendra N Lahiri⁴, Shyamal D Peddada⁵

Affiliations

¹ Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, United States.
² Statistics Department, University of Pennsylvania, Philadelphia, PA, United States.
³ Social and Scientific Systems, Durham, NC, United States.
⁴ Department of Statistics, North Carolina State University, Raleigh, NC, United States.
⁵ Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States.

PMID: 31143201
PMCID: PMC6520559
DOI: 10.3389/fgene.2019.00387

Abstract

Quantitative high throughput screening (qHTS) experiments can generate 1000s of concentration-response profiles to screen compounds for potentially adverse effects. However, potency estimates for a single compound can vary considerably in study designs incorporating multiple concentration-response profiles for each compound. We introduce an automated quality control procedure based on analysis of variance (ANOVA) to identify and filter out compounds with multiple cluster response patterns and improve potency estimation in qHTS assays. Our approach, called Cluster Analysis by Subgroups using ANOVA (CASANOVA), clusters compound-specific response patterns into statistically supported subgroups. Applying CASANOVA to 43 publicly available qHTS data sets, we found that only about 20% of compounds with response values outside of the noise band have single cluster responses. The error rates for incorrectly separating true clusters and incorrectly clumping disparate clusters were both less than 5% in extensive simulation studies. Simulation studies also showed that the bias and variance of concentration at half-maximal response (AC₅₀ ) estimates were usually within 10-fold when using a weighted average approach for potency estimation. In short, CASANOVA effectively sorts out compounds with "inconsistent" response patterns and produces trustworthy AC₅₀ values.

Keywords: ANOVA; clustering; concentration-response; potency; quantitative high throughput screening; toxicological response.

PubMed Disclaimer

Figures

**FIGURE 1**
Three separate cases are represented by concentration-response data from the BG1 estrogen receptor agonist assay from phase II of the Tox21 collaboration (*tox21-er-luc-bg1-4e2-agonist-p2*). Responses are shown as a percentage of the assay positive control values after correction by DMSO negative controls (Inglese et al., 2006). The assay detection limits are indicated with dashed lines. An *AC₅₀* value from the Hill model, calculated using the weighted average approach, summarizes the potency of each cluster (see section “Materials and Methods”). **(A)** Case 1 shows 12 similar response profiles from oxymetholone which extend beyond noise and group together into a single cluster. This case corresponds to two different supplier designations, two library preparation sites and two purities (A and D, representing “good” and “poor” purity, respectively) generated on six different experimental days. **(B)** Case 2 shows nine responses from hydrochlorothiazide which all lie within the noise band and correspond to three supplier sources, three library preparation sites, and a single purity (A) generated in six different experimental days. **(C)** Case 3 is represented by 42 response profiles from 2,3,5,6-tetrachloronitrobenzene corresponding to one supplier, three library preparation sites, one purity designation (A) and seven experimental days. A total of 29 of the 42 repeats lie within the noise band (shown in gray), and other profiles cluster by our proposed methodology *CASANOVA* described in this paper into the three disparate groups of 9, 3, and 1 repeats shown in black, green, and red, respectively. The separation of clusters in Case 3 is not explained by library preparation site or experimental day.

**FIGURE 2**
A barplot was used to summarize the response patterns corresponding to 72 assay readouts from 43 different data sets. A total of 7,229 chemicals were common among all 43 data sets. In the barplot, the gray regions correspond to the fraction of chemicals clustered in the noise band (Case 2), the dark green regions refer to a single detectable cluster well-separated from the noise band (Conclusive Case 1), the light green regions represent a single cluster with response points not statistically separable from noise (Inconclusive Case 1), the pink regions correspond to multiple clusters with response points not statistically separable from the noise band (Inconclusive Case 3) and the red regions refer to multiple clusters well-separated from the noise band (Conclusive Case 3). Agonist assay labels are shown in dark blue, antagonist/inhibitor assay labels are shown in green and viability assay labels are shown in gray. Selected compound profiles from assays with multiple clusters (Conclusive Case 3) are shown to the right of the barplot. Known factors associated with different clusters are indicated in the upper left of each plot. These factors include supplier, library preparation site, concentration spacing, compound purity and experimental day. None of these factors explain the different patterns observed in the last two plots. Hence, adjusting or normalizing the concentration-response data for these known factors will not necessarily eliminate multiple cluster response patterns among repeats within a compound in qHTS data.

**FIGURE 3**
Complementary empirical cumulative distribution (CCDF) describing the variability in *AC₅₀* values. The maximal range of *AC₅₀* values (on the log₁₀ scale) was calculated for each compound in which two or more clusters were identified outside of the noise region for each of the 7,729 compounds investigated in the 43 data sets described in the text. The order of magnitude differences in intrachemical potency estimates shown here represent only those cases in which the calculated *AC₅₀* is between 10^-5 and 1000 μM, which covers the typical testing concentration range of ∼10^-4 to 100 μM evaluated in these assays. The number of compounds meeting this criterion ranged from 42 to 774 in the 72 assay types evaluated here, with a median of 255 compounds. **(A)** The CCDF (or 1-CDF) plots describing the proportion of compounds (y-axis) for a given spread in *AC₅₀* (x-axis) in the *tox21-er-luc-bg1-4e2-antagonist-p1* viability assay (blue) and the *tox21-gh3-tre-agonist-p1* agonist assay (red) are displayed. The vertical black lines indicate 10- and 100-fold differences in the calculated range of *AC₅₀* values. **(B)** The CCDF for the fraction of the 72 assays with greater than 10-fold range in *AC₅₀* values (y-axis) for a given spread in *AC₅₀* (x-axis) are shown for the agonist (dark blue), antagonist/inhibitor (dark green) and viability (dark gray) assays. **(C)** The CCDF for the fraction of the 72 assays with greater than 100-fold range in *AC₅₀* are shown for the same agonist, antagonist/inhibitor and viability assays presented in **(B)**.

See this image and copyright information in PMC

References

1. Abdo N., Xia M., Brown C. C., Kosyk O., Huang R., Sakamuru S., et al. (2015). Population-based in vitro hazard and concentration-response assessment of chemicals: the 1000 genomes high-throughput screening study. Environ. Health Pers. 123 458–466. 10.1289/ehp.1408775 - DOI - PMC - PubMed
1. Anthony Tony Cox L., Popken D. A., Kaplan A. M., Plunkett L. M., Becker R. A. (2016). How well can in vitro data predict in vivo effects of chemicals? Rodent carcinogenicity as a case study. Regul. Toxicol. Pharmacol. 77 54–64. 10.1016/j.yrtph.2016.02.005 - DOI - PubMed
1. Attene-Ramos M. S., Miller N., Huang R., Michael S., Itkin M., Kavlock R. J., et al. (2013). The Tox21 robotic platform for the assessment of environmental chemicals–from vision to reality. Drug Disc. Today 18 716–723. 10.1016/j.drudis.2013.05.015 - DOI - PMC - PubMed
1. Barretina J., Caponigro G., Stransky N., Venkatesan K., Margolin A. A., Kim S., et al. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 603–607. 10.1038/nature11003 - DOI - PMC - PubMed
1. Bouhaddou M., DiStefano M. S., Riesel E. A., Carrasco E., Holzapfel H. Y., Jones D. C., et al. (2016). Drug response consistency in CCLE and CGP. Nature 540 E9–E10. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quality Control of Quantitative High Throughput Screening Data

Affiliations

Quality Control of Quantitative High Throughput Screening Data

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources