Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;26(2):292-308.
doi: 10.1177/2472555220950245. Epub 2020 Aug 29.

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Affiliations

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Johanna Nyffeler et al. SLAS Discov. 2021 Feb.

Abstract

Phenotypic profiling assays are untargeted screening assays that measure a large number (hundreds to thousands) of cellular features in response to a stimulus and often yield diverse and unanticipated profiles of phenotypic effects, leading to challenges in distinguishing active from inactive treatments. Here, we compare a variety of different strategies for hit identification in imaging-based phenotypic profiling assays using a previously published Cell Painting data set. Hit identification strategies based on multiconcentration analysis involve curve fitting at several levels of data aggregation (e.g., individual feature level, aggregation of similarly derived features into categories, and global modeling of all features) and on computed metrics (e.g., Euclidean and Mahalanobis distance metrics and eigenfeatures). Hit identification strategies based on single-concentration analysis included measurement of signal strength (e.g., total effect magnitude) and correlation of profiles among biological replicates. Modeling parameters for each approach were optimized to retain the ability to detect a reference chemical with subtle phenotypic effects while limiting the false-positive rate to 10%. The percentage of test chemicals identified as hits was highest for feature-level and category-based approaches, followed by global fitting, whereas signal strength and profile correlation approaches detected the fewest number of active hits at the fixed false-positive rate. Approaches involving fitting of distance metrics had the lowest likelihood for identifying high-potency false-positive hits that may be associated with assay noise. Most of the methods achieved a 100% hit rate for the reference chemical and high concordance for 82% of test chemicals, indicating that hit calls are robust across different analysis approaches.

Keywords: Cell Painting; computational toxicology; concentration response; high-throughput phenotypic profiling.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflict of interest. This manuscript has been reviewed by the Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Figures

Fig. 1:
Fig. 1:. Approaches for Hit Determination from Imaging-Based Phenotypic Profiling Data.
Multi-concentration approaches for hit determination are shown in blue. Single-concentration approaches for hit determination are shown in pink. The number of individual BMCs that could potentially be derived from each multi-concentration approach are shown in the triangle to the left. The starting point for all approaches was well-level data for each phenotypic feature. Feature-level data can be fit and directly used for potency estimation, or the fit results can be aggregated to the category level (i.e. collection of related features) before determining hit calls and calculating potency estimates. Data from our adaptation of the Cell Painting assay can be reduced to 49 categories before curve-fitting using either feature reduction (PCA) or ssGSEA approaches. The 1300 individual features can also be used to calculate a Euclidean distance from controls and model this value as a single response variable. Similarly, feature-level data can be transformed to eigenfeatures to account for correlation among features and then distance from controls can be calculated using the Mahalanobis approach, . Eigenfeature-level data can also be used directly for curve fitting. For single concentration approaches, feature-level or eigenfeature-level data can be used to derive signatures and overall signal strength of the signature can be compared to controls. Alternatively, the correlation of signatures among biological replicates of the same treatment can be used as a hit calling criteria.
Fig. 2:
Fig. 2:. Comparison of Performance of Hit Determination Approaches.
A previously published data set was used to compare all approaches. U-2 OS cells were exposed for 24 h to the chemicals. Chemicals were tested in four biological replicates, resulting in a total of 48 assay plates organized as 12 plate groups. Approaches were optimized to a false positive rate of ~ 10% (vertical dashed line) based on a randomized null data set (red circles; n = 108) and the best possible true positive rate based on the reference chemical berberine chloride (green triangles; n = 12). Sixteen random test chemicals were screened in duplicate and used to calculate concordance (blue open diamonds) as the number of unique chemicals classified in both occurrences as either active or inactive. The hit rate of test chemicals (black squares) was calculated from 478 test chemicals, with the exception of approaches using tcplfit2 to fit, for which three chemicals had fewer than four concentrations and were excluded from concentration-response modeling. Method name abbreviations: ssGSEA: single sample gene set enrichment analysis; F: feature-based; E: eigenfeature-based.
Fig. 3:
Fig. 3:. Concordance of Hit Calls Across Approaches.
(A) Heatmap illustrating hit calls for all approaches (rows) and all chemicals (columns). Colors in the heatmap indicate whether the chemical was considered bioactive (gray) or inactive (white). The column annotation indicates the type of chemical: test chemical (blue), reference chemical (green), and null chemical (gray). The row annotation indicates multi-concentration approaches (blue) and single-concentration approaches (pink). (B) Pie charts summarizing the concordance among eleven approaches. Each pie chart slice indicates the proportion of 108 null chemicals (left) and 475 test chemicals (right), that were called as active by the number of approaches indicated by the numerical labels surrounding the pie charts. Four approaches with < 100% TPR were excluded (Global Euclidean, Signal Strength overall E, Signal Strength plate-wise E and Profile Correlation E). Three test chemicals had less than four concentrations, were not modelled with approaches that use tcplfit2, and were therefore excluded from the heatmap and pie chart. Abbreviations: ssGSEA: single sample gene set enrichment analysis; F: feature-based; E: eigenfeature-based.
Fig. 4:
Fig. 4:. Concordance of Potency Estimates Across Multi-Concentration Approaches.
(A) Reproducibility of potency estimates of reference chemicals. All four reference chemicals were tested in twelve replicates within the study. The gray area indicates the range of tested concentrations. Replicates with potencies below the tested concentration range and replicates without a potency estimate (i.e. inactives) are displayed ½ an order of magnitude below or above the tested concentration range, respectively. (B) Potency estimates of null chemicals that were identified as active by each approach. Null chemicals were arbitrarily mapped to a concentration range of 0.03 – 100 μM with ½ log10 spacing. (C) For the 16 test chemicals screened in duplicate, the difference of the two potency estimates is displayed for each test chemical that was identified as active in both instances for a respective approach (n = 7 – 10 per approach). The potency range is in units of log10(μM). (D) Differences in potency estimates of test chemicals across the nine approaches. For each test chemical that was active across all nine approaches (n = 229), the median potency was estimated. Then, for each approach (rows), the difference of each chemical potency to the median potency was calculated. (E) Potency estimates for all test chemicals (n = 475 for approaches fit with tcplfit2, and n = 478 for all others) and all approaches. Abbreviation: PAC: phenotype altering concentration; ssGSEA: single sample gene set enrichment analysis.
Fig. 5:
Fig. 5:. Comparison of Bioactivity Profiles Across Feature- and Category-Based Approaches.
(A) Potency (x-axis) vs effect size (y-axis) for both feature-level approaches (BMDExpress and tcplfit2). For each reference chemical and feature, the median BMC and the median absolute top of the curve was calculated from the 12 replicates. Features are only displayed if they had a valid BMC in the majority of replicates (i.e. ≥ 7). (B) BMC accumulation plots for all category-based approaches. For each reference chemical and category, the median BMC was calculated from the 12 replicates. Categories that had a valid BMC in the majority of replicates (i.e. ≥ 7) were ranked according to their potencies. Only the 15 most potent categories are displayed. In both (A) and (B), features and categories, respectively, were coded with respect to shape/fluorescent channel (color), feature type (letter) or cellular compartment (shape).

Similar articles

Cited by

References

    1. Caicedo JC; Singh S; Carpenter AE Applications in image-based profiling of perturbations. Curr Opin Biotechnol 2016, 39, 134–42. - PubMed
    1. Ramaiahgari SC; Auerbach SS; Saddler TO; et al. The Power of Resolution: Contextualized Understanding of Biological Responses to Liver Injury Chemicals Using High-throughput Transcriptomics and Benchmark Concentration Modeling. Toxicol Sci 2019, 169, 553–566. - PMC - PubMed
    1. Lamb J; Crawford ED; Peck D; et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–35. - PubMed
    1. De Abrew KN; Shan YK; Wang X; et al. Use of connectivity mapping to support read across: A deeper dive using data from 186 chemicals, 19 cell lines and 2 case studies. Toxicology 2019, 423, 84–94. - PubMed
    1. Bray MA; Singh S; Han H; et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 2016, 11, 1757–74. - PMC - PubMed

Publication types

LinkOut - more resources