Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

doi:10.1177/2472555220950245

. 2021 Feb;26(2):292-308.

doi: 10.1177/2472555220950245. Epub 2020 Aug 29.

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Johanna Nyffeler^{1

2}, Derik E Haggard^{1

2}, Clinton Willis^{1

3}, R Woodrow Setzer¹, Richard Judson¹, Katie Paul-Friedman¹, Logan J Everett¹, Joshua A Harrill¹

Affiliations

¹ Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC, USA.
² Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA.
³ Oak Ridge Associated Universities (ORAU), Oak Ridge, TN, USA.

PMID: 32862757
PMCID: PMC8673120
DOI: 10.1177/2472555220950245

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Johanna Nyffeler et al. SLAS Discov. 2021 Feb.

. 2021 Feb;26(2):292-308.

doi: 10.1177/2472555220950245. Epub 2020 Aug 29.

Authors

Johanna Nyffeler^{1

2}, Derik E Haggard^{1

2}, Clinton Willis^{1

3}, R Woodrow Setzer¹, Richard Judson¹, Katie Paul-Friedman¹, Logan J Everett¹, Joshua A Harrill¹

Affiliations

¹ Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC, USA.
² Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA.
³ Oak Ridge Associated Universities (ORAU), Oak Ridge, TN, USA.

PMID: 32862757
PMCID: PMC8673120
DOI: 10.1177/2472555220950245

Abstract

Phenotypic profiling assays are untargeted screening assays that measure a large number (hundreds to thousands) of cellular features in response to a stimulus and often yield diverse and unanticipated profiles of phenotypic effects, leading to challenges in distinguishing active from inactive treatments. Here, we compare a variety of different strategies for hit identification in imaging-based phenotypic profiling assays using a previously published Cell Painting data set. Hit identification strategies based on multiconcentration analysis involve curve fitting at several levels of data aggregation (e.g., individual feature level, aggregation of similarly derived features into categories, and global modeling of all features) and on computed metrics (e.g., Euclidean and Mahalanobis distance metrics and eigenfeatures). Hit identification strategies based on single-concentration analysis included measurement of signal strength (e.g., total effect magnitude) and correlation of profiles among biological replicates. Modeling parameters for each approach were optimized to retain the ability to detect a reference chemical with subtle phenotypic effects while limiting the false-positive rate to 10%. The percentage of test chemicals identified as hits was highest for feature-level and category-based approaches, followed by global fitting, whereas signal strength and profile correlation approaches detected the fewest number of active hits at the fixed false-positive rate. Approaches involving fitting of distance metrics had the lowest likelihood for identifying high-potency false-positive hits that may be associated with assay noise. Most of the methods achieved a 100% hit rate for the reference chemical and high concordance for 82% of test chemicals, indicating that hit calls are robust across different analysis approaches.

Keywords: Cell Painting; computational toxicology; concentration response; high-throughput phenotypic profiling.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflict of interest. This manuscript has been reviewed by the Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Figures

**Fig. 1:. Approaches for Hit Determination from Imaging-Based Phenotypic Profiling Data.**
Multi-concentration approaches for hit determination are shown in blue. Single-concentration approaches for hit determination are shown in pink. The number of individual BMCs that could potentially be derived from each multi-concentration approach are shown in the triangle to the left. The starting point for all approaches was well-level data for each phenotypic feature. Feature-level data can be fit and directly used for potency estimation, or the fit results can be aggregated to the category level (i.e. collection of related features) before determining hit calls and calculating potency estimates. Data from our adaptation of the Cell Painting assay can be reduced to 49 categories before curve-fitting using either feature reduction (PCA) or ssGSEA approaches. The 1300 individual features can also be used to calculate a Euclidean distance from controls and model this value as a single response variable. Similarly, feature-level data can be transformed to eigenfeatures to account for correlation among features and then distance from controls can be calculated using the Mahalanobis approach^,. Eigenfeature-level data can also be used directly for curve fitting. For single concentration approaches, feature-level or eigenfeature-level data can be used to derive signatures and overall signal strength of the signature can be compared to controls. Alternatively, the correlation of signatures among biological replicates of the same treatment can be used as a hit calling criteria.

**Fig. 2:. Comparison of Performance of Hit Determination Approaches.**
A previously published data set was used to compare all approaches. U-2 OS cells were exposed for 24 h to the chemicals. Chemicals were tested in four biological replicates, resulting in a total of 48 assay plates organized as 12 plate groups. Approaches were optimized to a false positive rate of ~ 10% (vertical dashed line) based on a randomized null data set (red circles; n = 108) and the best possible true positive rate based on the reference chemical berberine chloride (green triangles; n = 12). Sixteen random test chemicals were screened in duplicate and used to calculate concordance (blue open diamonds) as the number of unique chemicals classified in both occurrences as either active or inactive. The hit rate of test chemicals (black squares) was calculated from 478 test chemicals, with the exception of approaches using *tcplfit2* to fit, for which three chemicals had fewer than four concentrations and were excluded from concentration-response modeling. Method name abbreviations: ssGSEA: single sample gene set enrichment analysis; F: feature-based; E: eigenfeature-based.

**Fig. 3:. Concordance of Hit Calls Across Approaches.**
(A) Heatmap illustrating hit calls for all approaches (rows) and all chemicals (columns). Colors in the heatmap indicate whether the chemical was considered bioactive (gray) or inactive (white). The column annotation indicates the type of chemical: test chemical (blue), reference chemical (green), and null chemical (gray). The row annotation indicates multi-concentration approaches (blue) and single-concentration approaches (pink). (B) Pie charts summarizing the concordance among eleven approaches. Each pie chart slice indicates the proportion of 108 null chemicals (left) and 475 test chemicals (right), that were called as active by the number of approaches indicated by the numerical labels surrounding the pie charts. Four approaches with < 100% TPR were excluded (Global Euclidean, Signal Strength overall E, Signal Strength plate-wise E and Profile Correlation E). Three test chemicals had less than four concentrations, were not modelled with approaches that use *tcplfit2*, and were therefore excluded from the heatmap and pie chart. Abbreviations: ssGSEA: single sample gene set enrichment analysis; F: feature-based; E: eigenfeature-based.

**Fig. 4:. Concordance of Potency Estimates Across Multi-Concentration Approaches.**
(A) Reproducibility of potency estimates of reference chemicals. All four reference chemicals were tested in twelve replicates within the study. The gray area indicates the range of tested concentrations. Replicates with potencies below the tested concentration range and replicates without a potency estimate (i.e. inactives) are displayed ½ an order of magnitude below or above the tested concentration range, respectively. (B) Potency estimates of null chemicals that were identified as active by each approach. Null chemicals were arbitrarily mapped to a concentration range of 0.03 – 100 μM with ½ log₁₀ spacing. (C) For the 16 test chemicals screened in duplicate, the difference of the two potency estimates is displayed for each test chemical that was identified as active in both instances for a respective approach (n = 7 – 10 per approach). The potency range is in units of log₁₀(μM). (D) Differences in potency estimates of test chemicals across the nine approaches. For each test chemical that was active across all nine approaches (n = 229), the median potency was estimated. Then, for each approach (rows), the difference of each chemical potency to the median potency was calculated. (E) Potency estimates for all test chemicals (n = 475 for approaches fit with *tcplfit2*, and n = 478 for all others) and all approaches. Abbreviation: PAC: phenotype altering concentration; ssGSEA: single sample gene set enrichment analysis.

**Fig. 5:. Comparison of Bioactivity Profiles Across Feature- and Category-Based Approaches.**
(A) Potency (x-axis) vs effect size (y-axis) for both feature-level approaches (BMDExpress and *tcplfit2*). For each reference chemical and feature, the median BMC and the median absolute top of the curve was calculated from the 12 replicates. Features are only displayed if they had a valid BMC in the majority of replicates (i.e. ≥ 7). (B) BMC accumulation plots for all category-based approaches. For each reference chemical and category, the median BMC was calculated from the 12 replicates. Categories that had a valid BMC in the majority of replicates (i.e. ≥ 7) were ranked according to their potencies. Only the 15 most potent categories are displayed. In both (A) and (B), features and categories, respectively, were coded with respect to shape/fluorescent channel (color), feature type (letter) or cellular compartment (shape).

See this image and copyright information in PMC

Cited by

A Decade in a Systematic Review: The Evolution and Impact of Cell Painting.
Seal S, Trapotsi MA, Spjuth O, Singh S, Carreras-Puigvert J, Greene N, Bender A, Carpenter AE. Seal S, et al. bioRxiv [Preprint]. 2024 May 7:2024.05.04.592531. doi: 10.1101/2024.05.04.592531. bioRxiv. 2024. PMID: 38766203 Free PMC article. Preprint.
Reference compounds for characterizing cellular injury in high-content cellular morphology assays.
Dahlin JL, Hua BK, Zucconi BE, Nelson SD Jr, Singh S, Carpenter AE, Shrimp JH, Lima-Fernandes E, Wawer MJ, Chung LPW, Agrawal A, O'Reilly M, Barsyte-Lovejoy D, Szewczyk M, Li F, Lak P, Cuellar M, Cole PA, Meier JL, Thomas T, Baell JB, Brown PJ, Walters MA, Clemons PA, Schreiber SL, Wagner BK. Dahlin JL, et al. Nat Commun. 2023 Mar 13;14(1):1364. doi: 10.1038/s41467-023-36829-x. Nat Commun. 2023. PMID: 36914634 Free PMC article.
Optimization of Human Neural Progenitor Cells for an Imaging-Based High-Throughput Phenotypic Profiling Assay for Developmental Neurotoxicity Screening.
Culbreth M, Nyffeler J, Willis C, Harrill JA. Culbreth M, et al. Front Toxicol. 2022 Feb 16;3:803987. doi: 10.3389/ftox.2021.803987. eCollection 2021. Front Toxicol. 2022. PMID: 35295155 Free PMC article.
A Comparison of In Vitro Points of Departure with Human Blood Levels for Per- and Polyfluoroalkyl Substances (PFAS).
Judson RS, Smith D, DeVito M, Wambaugh JF, Wetmore BA, Paul Friedman K, Patlewicz G, Thomas RS, Sayre RR, Olker JH, Degitz S, Padilla S, Harrill JA, Shafer T, Carstens KE. Judson RS, et al. Toxics. 2024 Apr 5;12(4):271. doi: 10.3390/toxics12040271. Toxics. 2024. PMID: 38668494 Free PMC article.
Modeling omics dose-response at the pathway level with DoseRider.
Monfort-Lanzas P, Gostner JM, Hackl H. Monfort-Lanzas P, et al. Comput Struct Biotechnol J. 2025 Apr 3;27:1440-1448. doi: 10.1016/j.csbj.2025.04.004. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 40242291 Free PMC article.

See all "Cited by" articles

References

1. Caicedo JC; Singh S; Carpenter AE Applications in image-based profiling of perturbations. Curr Opin Biotechnol 2016, 39, 134–42. - PubMed
1. Ramaiahgari SC; Auerbach SS; Saddler TO; et al. The Power of Resolution: Contextualized Understanding of Biological Responses to Liver Injury Chemicals Using High-throughput Transcriptomics and Benchmark Concentration Modeling. Toxicol Sci 2019, 169, 553–566. - PMC - PubMed
1. Lamb J; Crawford ED; Peck D; et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–35. - PubMed
1. De Abrew KN; Shan YK; Wang X; et al. Use of connectivity mapping to support read across: A deeper dive using data from 186 chemicals, 19 cell lines and 2 case studies. Toxicology 2019, 423, 84–94. - PubMed
1. Bray MA; Singh S; Han H; et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 2016, 11, 1757–74. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

EPA999999/ImEPA/Intramural EPA/United States

LinkOut - more resources

Full Text Sources

[1] Caicedo JC; Singh S; Carpenter AE Applications in image-based profiling of perturbations. Curr Opin Biotechnol 2016, 39, 134–42. - PubMed

[2] Caicedo JC; Singh S; Carpenter AE Applications in image-based profiling of perturbations. Curr Opin Biotechnol 2016, 39, 134–42. - PubMed

[3] Ramaiahgari SC; Auerbach SS; Saddler TO; et al. The Power of Resolution: Contextualized Understanding of Biological Responses to Liver Injury Chemicals Using High-throughput Transcriptomics and Benchmark Concentration Modeling. Toxicol Sci 2019, 169, 553–566. - PMC - PubMed

[4] Ramaiahgari SC; Auerbach SS; Saddler TO; et al. The Power of Resolution: Contextualized Understanding of Biological Responses to Liver Injury Chemicals Using High-throughput Transcriptomics and Benchmark Concentration Modeling. Toxicol Sci 2019, 169, 553–566. - PMC - PubMed

[5] Lamb J; Crawford ED; Peck D; et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–35. - PubMed

[6] Lamb J; Crawford ED; Peck D; et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–35. - PubMed

[7] De Abrew KN; Shan YK; Wang X; et al. Use of connectivity mapping to support read across: A deeper dive using data from 186 chemicals, 19 cell lines and 2 case studies. Toxicology 2019, 423, 84–94. - PubMed

[8] De Abrew KN; Shan YK; Wang X; et al. Use of connectivity mapping to support read across: A deeper dive using data from 186 chemicals, 19 cell lines and 2 case studies. Toxicology 2019, 423, 84–94. - PubMed

[9] Bray MA; Singh S; Han H; et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 2016, 11, 1757–74. - PMC - PubMed

[10] Bray MA; Singh S; Han H; et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 2016, 11, 1757–74. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Affiliations

Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources