Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 5;15(1):76.
doi: 10.1186/s13321-023-00754-4.

LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP

Affiliations

LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP

Yitian Wang et al. J Cheminform. .

Abstract

Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.

Keywords: Graph neural network; Lipid solubility; Molecular property prediction; logD7.4.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The architecture of the RTlogD model. a The graph neural network used in RTlogD. b Transfer learning of RT and the multitask learning of logP and logD module
Fig. 2
Fig. 2
Comparison of the maximum Tanimoto similarities distribution within DB29-Data (red), and between T-Data and DB29-Data (blue) using ECFP4
Fig. 3
Fig. 3
Effect of training data size on the prediction performance of T-data. a Model performance variation with and without RT pre-training. b t-SNE distribution of T-data and RT by ECFP4. c t-SNE distribution of T-data and 1000 training data sampled from DB29-data by ECFP4. d t-SNE distribution of T-data and 4000 training data. e t-SNE distribution of T-data and 8000 training data
Fig. 4
Fig. 4
Scatter plot of experimental logP and logD values in the dataset. Spearman’s correlation coefficient values can range from − 1 to 1, where values of 1, 0, and − 1 indicate perfect positive correlation, no correlation, and perfect negative correlation, respectively
Fig. 5
Fig. 5
Visualization of attention weight distribution. Attention weights with blue indicates a value less than 0.5 and red indicates a value greater than 0.5 after normalization. The predicted error values of different methods are denoted by ∆logD7.4 and presented as the length of the error bars

References

    1. Waring MJ. Lipophilicity in drug discovery. Expert Opin Drug Discov. 2010;5:235–248. - PubMed
    1. Rutkowska E, Pajak K, Jozwiak K. Lipophilicity–methods of determination and its role in medicinal chemistry. Acta Pol Pharm. 2013;70:3–18. - PubMed
    1. Hughes JD, Blagg J, Price DA, Bailey S, Decrescenzo GA, Devraj RV, Ellsworth E, Fobian YM, Gibbs ME, Gilles RW, et al. Physiochemical drug properties associated with in vivo toxicological outcomes. Bioorg Med Chem Lett. 2008;18:4872–4875. - PubMed
    1. Challener C. Oral delivery of biologic APIs: the challenge continues. PharmTech Home. 2017;41:24–28.
    1. Broccatelli F, Aliagas I, Zheng H. Why decreasing lipophilicity alone is often not a reliable strategy for extending IV half-life. ACS Med Chem Lett. 2018;9:522–527. - PMC - PubMed

LinkOut - more resources