Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma

Michael Riedl¹, Sayak Mukherjee¹, Mitch Gauthier¹

Affiliations

PMID: 37656906
DOI: 10.1021/acs.molpharmaceut.3c00129

Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma

Michael Riedl et al. Mol Pharm. 2023.

. 2023 Oct 2;20(10):4984-4993.

doi: 10.1021/acs.molpharmaceut.3c00129. Epub 2023 Sep 1.

Authors

Michael Riedl¹, Sayak Mukherjee¹, Mitch Gauthier¹

Affiliation

¹ Battelle, Columbus, Ohio 43201, United States.

PMID: 37656906
DOI: 10.1021/acs.molpharmaceut.3c00129

Abstract

Chemical-specific parameters are either measured in vitro or estimated using quantitative structure-activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers, which allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high-dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high-dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset─a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (f_u,p). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity.

Keywords: BERT; QSAR; deep learning; fraction unbound; human plasma.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- American Chemical Society

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma

Affiliation

Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma

Authors

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources