Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 11;14(8):790.
doi: 10.3390/ph14080790.

Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors

Affiliations

Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors

Anke Wilm et al. Pharmaceuticals (Basel). .

Abstract

In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model ("Skin Doctor CP:Bio") obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.

Keywords: bioactivity descriptors; conformal prediction; in silico prediction; machine learning; random forest; skin sensitization; toxicity prediction.

PubMed Disclaimer

Conflict of interest statement

A.W. is funded by Beiersdorf AG through HITeC e.V and J.K. (Jochen Kühnl) is employed at Beiersdorf AG.

Figures

Figure 1
Figure 1
Schematic representation of the workflow for feature selection.
Figure 2
Figure 2
Mean performance of 10-fold CV as a function of the number of bioactivity descriptors selected for model building at the significance level (A) ε=0.05, (B) ε=0.10, (C) ε=0.20, (D) ε=0.30. The horizontal, dashed line indicates the validity expected from the selected significance level ε; the vertical, dashed line marks the performance of models trained on 10 descriptors.
Figure 3
Figure 3
PCA quantifying the coverage of the LLNA data by the reference sets of (A) cosmetics, (B) approved drugs, and (C) agrochemicals in the feature space of the 10 selected bioactivity descriptors. The percentages in parentheses report the variance explained by the respective principal component (PC).
Figure 4
Figure 4
LLNA data set analyzed by PCA in the feature space of the ten selected bioactivity descriptors. (A) Scatter plot colored by the binary skin sensitization potential; (B) loadings plot of the ten descriptors. The percentages in parentheses report the variance explained by the respective principal component (PC). Note that the axis sections differ for panels (A,B).
Figure 5
Figure 5
Architecture of the consensus model.
Figure 6
Figure 6
Relationship between MCC and coverage for the individual and the combined models.
Figure 7
Figure 7
Performance of the RF classifier (n_estimators = 500; all other parameters default) underlying the CP model as a function of the number of instances the model was trained on.

Similar articles

Cited by

References

    1. Kimber I., Basketter D.A., Gerberick G.F., Ryan C.A., Dearman R.J. Chemical Allergy: Translating Biology into Hazard Characterization. Toxicol. Sci. 2011;120(Suppl. 1):S238–S268. doi: 10.1093/toxsci/kfq346. - DOI - PubMed
    1. Olusegun O.A., Martincigh B.S. Allergic Contact Dermatitis: A Significant Environmental and Occupational Skin Disease. Int. J. Dermatol. 2021 doi: 10.1111/ijd.15502. - DOI - PubMed
    1. Lushniak B.D. Occupational Contact Dermatitis. Dermatol. Ther. 2004;17:272–277. doi: 10.1111/j.1396-0296.2004.04032.x. - DOI - PubMed
    1. Thyssen J.P., Linneberg A., Menné T., Johansen J.D. The Epidemiology of Contact Allergy in the General Population—Prevalence and Main Findings. Contact Dermat. 2007;57:287–299. doi: 10.1111/j.1600-0536.2007.01220.x. - DOI - PubMed
    1. van Amerongen C.C.A., Ofenloch R.F., Cazzaniga S., Elsner P., Gonçalo M., Naldi L., Svensson Å., Bruze M., Schuttelaar M.L.A. Skin Exposure to Scented Products Used in Daily Life and Fragrance Contact Allergy in the European General Population—The EDEN Fragrance Study. Contact Dermat. 2021;84:385–394. doi: 10.1111/cod.13807. - DOI - PMC - PubMed

LinkOut - more resources