. 2023 Sep 18;36(9):1503-1517.

doi: 10.1021/acs.chemrestox.3c00137. Epub 2023 Aug 16.

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon

Davide Bassani¹, Alessandro Brigo¹, Andrea Andrews-Morger¹

Affiliations

PMID: 37584277
PMCID: PMC10523574
DOI: 10.1021/acs.chemrestox.3c00137

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon

Davide Bassani et al. Chem Res Toxicol. 2023.

. 2023 Sep 18;36(9):1503-1517.

doi: 10.1021/acs.chemrestox.3c00137. Epub 2023 Aug 16.

Authors

Davide Bassani¹, Alessandro Brigo¹, Andrea Andrews-Morger¹

Affiliation

¹ Pharmaceutical Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland.

PMID: 37584277
PMCID: PMC10523574
DOI: 10.1021/acs.chemrestox.3c00137

Abstract

In silico approaches have acquired a towering role in pharmaceutical research and development, allowing laboratories all around the world to design, create, and optimize novel molecular entities with unprecedented efficiency. From a toxicological perspective, computational methods have guided the choices of medicinal chemists toward compounds displaying improved safety profiles. Even if the recent advances in the field are significant, many challenges remain active in the on-target and off-target prediction fields. Machine learning methods have shown their ability to identify molecules with safety concerns. However, they strongly depend on the abundance and diversity of data used for their training. Sharing such information among pharmaceutical companies remains extremely limited due to confidentiality reasons, but in this scenario, a recent concept named "federated learning" can help overcome such concerns. Within this framework, it is possible for companies to contribute to the training of common machine learning algorithms, using, but not sharing, their proprietary data. Very recently, Lhasa Limited organized a hackathon involving several industrial partners in order to assess the performance of their federated learning platform, called "Effiris". In this paper, we share our experience as Roche in participating in such an event, evaluating the performance of the federated algorithms and comparing them with those coming from our in-house-only machine learning models. Our aim is to highlight the advantages of federated learning and its intrinsic limitations and also suggest some points for potential improvements in the method.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Representation of the main Effiris federated learning scheme for each of the company members.

**Figure 2**
Representation of the Effiris SOHN federated learning scheme. The internal member data are used to train the “teacher” model (A), while both internal and Lhasa public data are combined to train the “hybrid” model (B). The final “hybrid federated” model is trained on the combination of internal, public, and federated data (C).

**Figure 3**
Representation of the Effiris MLP federated learning scheme. (A) An initial model is trained on public data curated by Lhasa. This pretrained model is then refined with internal data in order to get the “fine-tuned” MLP. (B) The federated labels obtained from predictions of consolidated “fine-tuned models” on the FLuID data from the members are then used to train another model, called “pretrained federated”, which is further refined with member data, generating the “fine-tuned federated” model.

**Figure 4**
t-SNE plots representing the distribution of the chemical space for the AChM₁R data sets. Specifically, panel A depicts the reduction of the pool of features composed by the 208 RDKit physicochemical descriptors and the 2048-bits fingerprints to just two dimensions, while in panel B, the three-dimensional reduction is reported.

**Figure 5**
Mathematical formula for the Matthews Correlation Coefficient (MCC) calculation. The acronyms are the following: TP = True Positive; TN = True Negative; FP = False Positive; FN = False Negative).

**Figure 6**
Trends in the Applicability Domain (AD) compared to the main performance metrics considered in the present study of the Lhasa SOHN architecture. The AD versus the sensitivity, specificity, balanced accuracy, and MCC of the prediction on the external test set are reported in (A–D), respectively. As the legend shows, the purple line indicates the data coming from the predictions on the COX2 data set, the green one stands for GABA_A, the cyan for the hERG channel, the dashed blue line for AChM₁R, and the dashed black line depicts the outcomes for 5-HT_2B. The different data sets considered are also indicated with a different letter, as reported below. Specifically, T indicates metrics and AD for the “teacher” SOHN model (trained just on the internal Roche data); HI stands for “hybrid internal” and represents the SOHN model trained on both Lhasa public data and Roche internal data. Finally, the acronym “HF”, or “hybrid federated”, indicates the SOHN model trained on both federated data and Roche internal data.

**Figure 7**
Trends in the Applicability Domain (AD) compared to the main performance metrics considered in the studies presented here for the Lhasa MLP architecture. Specifically, the AD versus the sensitivity, specificity, balanced accuracy, and MCC of the prediction on the external test set are reported in (A–D), respectively. As the legend represents, the purple line indicates the data coming from the predictions on the COX2 data set, the green one stands for GABA_A, the cyan for the hERG channel, the dashed blue line for AChM₁R, and the dashed black line depicts the outcomes for 5-HT_2B. The different data sets considered are also indicated with a different letter, as reported below. Specifically, PI indicates metrics and AD for the pretrained internal MLP model (trained just on the Lhasa public data); RI stands for “refined internal” and represents the MLP model trained on Lhasa Limited public data and then refined with Roche internal data. Then, the letter “F”, or “federated”, indicates the MLP model trained on just the federated data, while “RF” (“refined federated”) represents the MLP trained on the federated data and then refined using Roche internal data.

**Figure 8**
Plots representing the metrics of the predictions on the external test set, using the different training setups discussed in the previous paragraph (for both internal and Lhasa models). A–D represent the sensitivity, specificity, balanced accuracy, and MCC metrics for each of the cases considered, respectively. For the first three plots, also the 0.5 value is marked with a dashed red line, indicating the point below which the predictions start to lose relevance (half of the positives/negatives are misclassified by the algorithm).

See this image and copyright information in PMC

References

1. Samuel A. L. Some Studies in Machine Learning Using the Game of Checkers,. IBM J. Res. Dev. 1959, 3 (3), 210–229. 10.1147/rd.33.0210. - DOI
1. Badillo S.; et al. An Introduction to Machine Learning,. Clin. Pharmacol. Ther. 2020, 107 (4), 871–885. 10.1002/cpt.1796. - DOI - PMC - PubMed
1. Alzubaidi L.; et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,. J. Big Data 2021, 8 (1), 53.10.1186/s40537-021-00444-8. - DOI - PMC - PubMed
1. Morger A.; et al. Assessing the calibration in toxicological in vitro models with conformal prediction,. J. Cheminform. 2021, 13 (1), 35.10.1186/s13321-021-00511-5. - DOI - PMC - PubMed
1. Engel E. A.; Anelli A.; Hofstetter A.; Paruzzo F.; Emsley L.; Ceriotti M. A Bayesian approach to NMR crystal structure determination,. Phys. Chem. Chem. Phys. 2019, 21 (42), 23385–23400. 10.1039/C9CP04489B. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon

Affiliation

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources