Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 30;12(1):20647.
doi: 10.1038/s41598-022-24889-w.

Cheminformatics analysis of chemicals that increase estrogen and progesterone synthesis for a breast cancer hazard assessment

Affiliations

Cheminformatics analysis of chemicals that increase estrogen and progesterone synthesis for a breast cancer hazard assessment

Alexandre Borrel et al. Sci Rep. .

Abstract

Factors that increase estrogen or progesterone (P4) action are well-established as increasing breast cancer risk, and many first-line treatments to prevent breast cancer recurrence work by blocking estrogen synthesis or action. In previous work, using data from an in vitro steroidogenesis assay developed for the U.S. Environmental Protection Agency (EPA) ToxCast program, we identified 182 chemicals that increased estradiol (E2up) and 185 that increased progesterone (P4up) in human H295R adrenocortical carcinoma cells, an OECD validated assay for steroidogenesis. Chemicals known to induce mammary effects in vivo were very likely to increase E2 or P4 synthesis, further supporting the importance of these pathways for breast cancer. To identify additional chemical exposures that may increase breast cancer risk through E2 or P4 steroidogenesis, we developed a cheminformatics approach to identify structural features associated with these activities and to predict other E2 or P4 steroidogens from their chemical structures. First, we used molecular descriptors and physicochemical properties to cluster the 2,012 chemicals screened in the steroidogenesis assay using a self-organizing map (SOM). Structural features such as triazine, phenol, or more broadly benzene ramified with halide, amine or alcohol, are enriched for E2 or P4up chemicals. Among E2up chemicals, phenol and benzenone are found as significant substructures, along with nitrogen-containing biphenyls. For P4up chemicals, phenol and complex aromatic systems ramified with oxygen-based groups such as flavone or phenolphthalein are significant substructures. Chemicals that are active for both E2up and P4up are enriched with substructures such as dihydroxy phosphanedithione or are small chemicals that contain one benzene ramified with chlorine, alcohol, methyl or primary amine. These results are confirmed with a chemotype ToxPrint analysis. Then, we used machine learning and artificial intelligence algorithms to develop and validate predictive classification QSAR models for E2up and P4up chemicals. These models gave reasonable external prediction performances (balanced accuracy ~ 0.8 and Matthews Coefficient Correlation ~ 0.5) on an external validation. The QSAR models were enriched by adding a confidence score that considers the chemical applicability domain and a ToxPrint assessment of the chemical. This profiling and these models may be useful to direct future testing and risk assessments for chemicals related to breast cancer and other hormonally-mediated outcomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Venn diagram representing the overlap between the active chemical set that increases production of E2 (E2up), the active chemical set that increase production of P4 (P4up) and all chemicals tested in the H295R assay (H295R set).
Figure 2
Figure 2
Workflow of the study.
Figure 3
Figure 3
Structure based SOM on the 1,925 curated structures tested on the H295R assays including 64 clusters colored (A) using the percent of E2up chemicals, 182 structures, (B) using the percent of P4up chemicals, 186 chemicals and (C) the percent of the union of E2up and P4up structures, 71 structures. Some structural examples for enriched clusters are shown; cluster 60: 1. benzidine (92–87–5), 2. C.I. Solvent Yellow 56 (2481–94–9); cluster 18: 3. Prometryn (7287–19–6), 4. Simazine (122–34–9), 5. Cybutryne (28,159–98–0); cluster 21: 6. sodium 2-phenylphenate tetrahydrate (6152–33–6), 7. 2-ethoxy-5-(1-propenyl)phenol (94–86–0), 8. Isoeugenol (97–54–1); cluster 60: 9. Dapsone (80–08–0), 10. n-phenyl-1,4-benzenediamine (101–54–2), 11. 3,3′-dimethylbenzidine (119–93–7); cluster 50: 12. Apigenin (520–36–5), 13. 4,4′-dulfonylbis[2-(prop-2-en-1-yl)phenol] (41,481–66–7), 14. Phenolphthalin (81–90–3); cluster 11: 15. Sulprofos (35,400–43–2), 16. Phosmet (732–11–6), 17. Malathion (121–75–5); cluster 18: 18. Anilazine (101–05–3), 19. Ametryn (834–12–8), 20. 2,4,6-tris(allyloxy)-1,3,5-triazine (101–37–1); cluster 11: 21. Parathion (56–38–2), 22. Diazinon (333–41–5), 23. Ethion (563–12–2); cluster 14: 24. 2,4,6-trichlorophenol (88–06–2), 25. para-phenylenediamine (106–50-3), 26. 4-chloro-2-methylphenol (1570–64–5), 27. Hydroquinone (123–31–9) and 28. catechol (120–80–9). Please note that the color scales are different for each panel.
Figure 4
Figure 4
Chemical structure representation of (A) 29. Estradiol—E2 (DTXSID: DTXSID0020573, CASRN: 50–28–2) (B) and 30. Progesterone—P4 (DTXSID: DTXSID3022370, CASRN: 57–83–0).
Figure 5
Figure 5
Structure based SOM on the H295R set including 64 clusters colored (A) using the average similarity score by cluster for E2 and (B) using the average similarity score by cluster for P4. Structures of E2up and P4up chemicals in clusters 45 and 53 are represented. Cluster 45: 31. Triamcinolone (124–94–7), 32. Dexamethasone sodium phosphate (2392–39–4), 33. Spironolactone (52–01–7), 34. Mifepristone (84,371–65–3) and 35. 17-methyltestosterone (58–18-4); cluster 53: 36. (E)-beta-damascone (23,726–91–2).
Figure 6
Figure 6
Distribution of ToxPrints that are significantly more common and the most represented (present in > 20% E2up chemicals) among the 729 ToxPrints identified in E2up chemicals. Statistical significance was computed using a Pearson's chi-squared test and significant p-values are reported as: p-value < 0.001 (***), p-value < 0.01 (**), p-value < 0.05 (*) and p-value ≥ 0.05 (−). ToxPrints are ordered by p-values, from the most significant to the least.
Figure 7
Figure 7
Distribution of ToxPrints that are significantly more common and the most represented (present in > 20% P4up chemicals) among the 729 ToxPrints identified in P4up chemicals. Statistical significance was computed using a Pearson's chi-squared test and significant p-values are reported as: p-value < 0.001 (***), p-value < 0.01 (**), p-value < 0.05 (*) and p-value ≥ 0.05 (−). ToxPrints are ordered by p-values, from the most significant to the least.
Figure 8
Figure 8
ToxPrint network for (A) E2up and (B) P4up chemicals. ToxPrints are represented by nodes colored by the number of the chemicals that included that ToxPrint. Combinations of ToxPrints are represented by the arc line colored based on the number of chemicals sharing these two ToxPrints. Only significant ToxPrints for E2up or P4up are represented (p-value < 0.01).
Figure 9
Figure 9
Top 10 of descriptor importance, in relative value, of features for balanced RF developed for (A) E2up and (B) P4up QSAR models.
Figure 10
Figure 10
Venn diagram between active E2up, P4up, H295R and MC chemical sets.
Figure 11
Figure 11
Projection of the MC chemicals set on the training set used to build the (A) QSAR-E2up model and (B) QSAR-P4up model. Projection is realized using a multidimensional scaling on the similarity matrix computed using a pair wise Tanimoto score from a MACSS fingerprint by chemicals.
Figure 12
Figure 12
Predicted E2up chemicals from the MC set of chemicals. The x-axis represents the number of ToxPrints significant for E2up, the y-axis represents the probability prediction to be E2up, and the applicability model is in blue (minimal similarity score with the first chemicals in the training set). Chemical structures are represented in the figure: 37. anti-benzo[a]chrysene-11,12-diol-13,14-epoxide (132,832–26–9), 38. 4-biphenylamine (92–67–1), 39. p-aminobiphenyl hydrochloride (2113–61–3), 40. 4-aminostilbene (834–24–2), 41. 2-aminofluorene (153–78–6), 42. 2,4-diaminoanisole sulfate (39,156–41-7), 43. 12-methylbenz(a)anthracene-7-carboxaldehyde (13,345–61–4), 44. 4,4′-methylenebis(2-chloroaniline) (101–14–4) and 45. 3,2′-dimethyl-4-aminobiphenyl (13,394–86–0).
Figure 13
Figure 13
Predicted P4up chemicals from the MC set of chemicals. The x-axis represents the number of ToxPrints significant for P4up, the y-axis represents the probability prediction to be P4up, and the applicability model is in blue (minimal similarity score with the first chemicals in the training set). Chemicals structural are represented in the figure: 46. n-(9-oxo-2-fluorenyl)acetamide (3096–50–2), 47. 12-methylbenz(a)anthracene-7-carboxaldehyde (13,345–61–4), 48. estradiol dipropionate (113–38–2), 49. 2,7-dinitrofluorene (5405–53–8), 50. 2-nitrofluorene (607–57–8), 51. 4-aminostilbene (834–24–2), 52. anti-benzo[a]chrysene-11,12-diol-13,14-epoxide (132,832–26–9), 53. Leucomalachite green (129–73–7), 54. 4-biphenylamine (92–67–1), 55. p-aminobiphenyl hydrochloride (2113–61–3), 56. 2,4-diaminoanisole sulfate (39,156–41–7), 57. 2-aminofluorene (153–78–6), 58. estradiol valerate (979–32–8), 59. 4,4′-methylenebis(2-chloroaniline) (101–14–4) and 60. 3,2′-dimethyl-4-aminobiphenyl (13,394–86–0).

Similar articles

Cited by

References

    1. Gore AC, et al. EDC-2: The endocrine society’s second scientific statement on endocrine-disrupting chemicals. Endocr. Rev. 2015;36:1–150. doi: 10.1210/er.2015-1010. - DOI - PMC - PubMed
    1. Sung H, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Colditz GA, Rosner BA, Chen WY, Holmes MD, Hankinson SE. Risk factors for breast cancer according to estrogen and progesterone receptor status. J. Natl. Cancer Inst. 2004;96:218–228. doi: 10.1093/jnci/djh025. - DOI - PubMed
    1. Rudel RA, Ackerman JM, Attfield KR, Brody JG. New exposure biomarkers as tools for breast cancer epidemiology, biomonitoring, and prevention: A systematic approach based on animal evidence. Environ. Health Perspect. 2014;122:881–895. doi: 10.1289/ehp.1307455. - DOI - PMC - PubMed
    1. Cardona B, Rudel RA. Application of an in vitro assay to identify chemicals that increase Estradiol and progesterone synthesis and are potential breast cancer risk factors. Environ. Health Perspect. 2021;129:077003. doi: 10.1289/EHP8608. - DOI - PMC - PubMed