Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 1:96:12-26.
doi: 10.1016/j.ymeth.2015.10.007. Epub 2015 Nov 4.

A metric and workflow for quality control in the analysis of heterogeneity in phenotypic profiles and screens

Affiliations

A metric and workflow for quality control in the analysis of heterogeneity in phenotypic profiles and screens

Albert Gough et al. Methods. .

Abstract

Heterogeneity is well recognized as a common property of cellular systems that impacts biomedical research and the development of therapeutics and diagnostics. Several studies have shown that analysis of heterogeneity: gives insight into mechanisms of action of perturbagens; can be used to predict optimal combination therapies; and can be applied to tumors where heterogeneity is believed to be associated with adaptation and resistance. Cytometry methods including high content screening (HCS), high throughput microscopy, flow cytometry, mass spec imaging and digital pathology capture cell level data for populations of cells. However it is often assumed that the population response is normally distributed and therefore that the average adequately describes the results. A deeper understanding of the results of the measurements and more effective comparison of perturbagen effects requires analysis that takes into account the distribution of the measurements, i.e. the heterogeneity. However, the reproducibility of heterogeneous data collected on different days, and in different plates/slides has not previously been evaluated. Here we show that conventional assay quality metrics alone are not adequate for quality control of the heterogeneity in the data. To address this need, we demonstrate the use of the Kolmogorov-Smirnov statistic as a metric for monitoring the reproducibility of heterogeneity in an SAR screen, describe a workflow for quality control in heterogeneity analysis. One major challenge in high throughput biology is the evaluation and interpretation of heterogeneity in thousands of samples, such as compounds in a cell-based screen. In this study we also demonstrate that three heterogeneity indices previously reported, capture the shapes of the distributions and provide a means to filter and browse big data sets of cellular distributions in order to compare and identify distributions of interest. These metrics and methods are presented as a workflow for analysis of heterogeneity in large scale biology projects.

Keywords: Drug discovery; Heterogeneity; High content screening; Phenotypic profiling; Systems biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dose response characterization of the activation of STAT3 by IL-6. The graph shows the activation of STAT3 by two-fold serial dilutions of IL-6 starting at 100 ng/mL. A-D) Images of STAT3 labeling illustrate the high level of variation in STAT3 activation at concentrations from (A) 12.5 ng/mL to (D).100 ng/mL. Original images were all scaled to a dynamic range of 100-40,000, then converted to 8-bit and pseudo-colored using the color lookup table shown. Cells with a labeling intensity ≥ Mean(positive control) + 3 Stdev(positive control) are considered Activated, and the % Activated cells is shown in the pie charts.
Figure 2
Figure 2
Standard assay quality measures are not sufficient for assessing the reproducibility of heterogeneity. Similar Z’-factors indicate consistent well-to-well assay reproducibility across plates but provide no indication of the variability of cellular response distributions. Even after plate-to-plate normalization there remains variation in the shapes of the distributions. Comparison of the cellular distributions of STAT3 activity in the IL-6+ control wells on five plates with essentially the same Z’-factor shows that the distributions can be very different from plate-to-plate. The S/B of the assay does give some indication of variation in the range of the data. As a result, the heterogeneity indices, QE, nNRM and %OL vary from plate-to-plate, with Plate 734 having the largest deviations. The QC-KS value provides a quantitative measure of the deviation of the distribution from the reference distributions established during assay validation. In this case, Plates 926 and 1302 have the lowest deviation (0.13) from the validation distributions.
Figure 3
Figure 3
Normalization of the distributions for all 117 plates. The plates were normalized to the median of the pooled reference validation plate controls. A. Histograms of the distributions in the pooled IL-6+ control wells indicate that the normalization is effective in establishing a consistent signal range across all plates.
Figure 4
Figure 4
Selection of a QC metric of reproducibility in heterogeneity analysis. Each pair of histograms (A-F) indicates the distribution of plates labeled as Fail (red) or Pass (blue). A single Failed plate (888) with a clipped distribution is highlighted in green for reference. A) The KS statistic, QC-KS, provides a clear interpretation and good discrimination between Pass and Fail, except for plate 888. B) Although cell count might contribute to the distribution of cellular response on a plate, in this case there is no significant difference in cell count between Passed and Failed plates. C) Differences in the percent activated cells also might result in variation in the distributions, and that does appear to be the case in this assay. D) The Failed plates have a much broader distribution as indicated by the increase in QE. E) The Failed plates have a less normal distribution as indicated by the increased nNRM. F) Some of the Failed plates have an increased number of outliers indicated by the increased %OL.
Figure 5
Figure 5
Quality Control workflow for heterogeneity analysis. A. To establish and quality control heterogeneity in high throughput imaging projects, the distributions in the control wells are evaluated during validation to assess the reproducibility of the heterogeneity. B. The distributions from the control wells on the validation plates are pooled to establish reference distributions. C. Each control well on the validation plates is compared with the reference distribution using the KS statistic (QC-KS), results are shown here as a heatmap, to quantitatively assess reproducibility. D. The QC-KS statistic is used to monitor the distributions during a screening campaign. The solid horizontal line is the median(QC-KS), and the dashed lines represent the median ± 3*MAD*K, the selected QC limit. E. The heatmap of the control wells on all the plates shows the variation in the QC-KS values from well-to-well and plate-to-plate, and the 10 plates that were Failed. In a new project, the Failed plates would be flagged for review.
Figure 6
Figure 6
Evaluating the performance of the Heterogeneity Indices in predicting the shape of the cellular distributions. A. Hierarchical clustering was used to group distributions from all the wells on 19 plates into 8 distinct classes. B. The distributions were then split into a training set and test set to construct a Random Forest classifier to predict the cluster number using only the 3 heterogeneity indices. C. A simpler decision tree model was also constructed to predict the distribution class. Although the performance was pretty good, Cluster 3 was not successfully separated from clusters 1, 2 and 4.
Figure 7
Figure 7
Filtering and drilling into 22,000 distributions using the Heterogeneity Indices. A. The QE and KS-norm indices were binned into 10 uniformly sized bins, and then used to sort the cell distributions in the horizontal and vertical directions, respectively. This provides an overview of the general distribution shapes. Each distribution in this view is composed of all the cells from all the wells for which the QE and KS-norm indices fell within that bin. B. Selecting a single bin to zoom in to the distributions that comprise that bin, but are now displayed with finer resolution on the binning for more detailed review of the distributions within that subset of wells.
Figure 8
Figure 8
Heterogeneity Browser for High Throughput Cytometry. The filtering concept in Figure 7 was expanded into an interactive Heterogeneity Browser in which the cellular heterogeneity in the wells can be identified and reviewed in a variety of ways. All data in the browser is linked, so that selection of data in one graph highlights the data in all other graphs. Here, the distributions with %Activation=45-55% have been selected in D, and are highlighted in cyan in all the views. A. 2D matrix of distributions with increasing QE index on the horizontal axis and increasing normality (decreasing nNRM) on the vertical axis. B. 2D matrix with increasing % OL on the horizontal axis and again decreasing nNRM on the vertical axis. C. The dose-response of distributions for a two compounds. The interface allows scrolling through all the compounds in the selection. D-G. Histograms of the percent activated cells and the 3 HIs for the pooled replicate wells over the whole data set. D. % activated cells. E. QE index. F. nNRM index. G. %OL index. H. List of all compounds highlighting one in the current selection.
Figure 9
Figure 9
The recommended workflow for heterogeneity analysis in screens or large scale biology projects. For new projects, heterogeneity analysis should be incorporated early in the development of the assay. For retrospective projects, step 1 should be to evaluate the range of the distributions of the positive and negative controls in the whole data set, and step 2 would be to normalize the data if necessary to establish consistent distributions from plate-to-plate. Steps 3-6 would remain the same.

Similar articles

Cited by

References

    1. Giuliano KA, Haskins JR, Taylor DL. Advances in high content screening for drug discovery. Assay Drug Dev Technol. 2003;1(4):565–77. - PubMed
    1. Gough A, et al. High-Content Analysis with Cellular and Tissue Systems Biology: a Bridge between Cancer Cell Biology and Tissue-Based Diagnostics. In: Mendelsohn J, et al., editors. The molecular basis of cancer. Saunders/Elsevier; Philadelphia, PA: 2015. pp. 369–392.
    1. Balluff B, et al. De novo discovery of phenotypic intratumour heterogeneity using imaging mass spectrometry. J Pathol. 2015;235(1):3–13. - PubMed
    1. Black CB, et al. Cell-based screening using high-throughput flow cytometry. Assay Drug Dev Technol. 2011;9(1):13–20. - PMC - PubMed
    1. Spiller DG, et al. Measurement of single-cell dynamics. Nature. 2010;465(7299):736–45. - PubMed

Publication types

MeSH terms