Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 26;16(1):145.
doi: 10.1186/s13321-024-00931-z.

Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

Affiliations

Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

Domenico Gadaleta et al. J Cheminform. .

Abstract

Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal experimentation. In the present manuscript, twelve software tools implementing Quantitative Structure-Activity Relationship (QSAR) models were selected for the prediction of 17 relevant PC and TK properties. A total of 41 validation datasets were collected from the literature, curated and used for assessing the models' external predictivity, emphasizing the performance of the models inside the applicability domain. Overall, the results confirmed the adequate predictive performance of the majority of the selected tools, with models for PC properties (R2 average = 0.717) generally outperforming those for TK properties (R2 average = 0.639 for regression, average balanced accuracy = 0.780 for classification). Notably, several of the tools evaluated exhibited good predictivity across different properties and were identified as recurring optimal choices. Moreover, a systematic analysis of the chemical space covered by the external validation datasets confirmed the validity of the collected results for relevant chemical categories (e.g., drugs and industrial chemicals), further increasing the confidence in the overall evaluation. The best performing models were ultimately suggested for each investigated property and proposed as robust computational tools for high-throughput assessment of highly relevant chemical properties. SCIENTIFIC CONTRIBUTION: The present manuscript provides an overview of the state-of-the-art available computational tools for predicting the PC and TK properties of chemicals. The results here offer valuable guidance to researchers, regulatory authorities, and the industry in identifying robust computational tools suitable for predicting relevant chemical properties in the context of chemical design, toxicity and environmental fate assessment.

Keywords: Computational; Physicochemical; QSAR; Toxicokinetic.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Competing interests: MGDL is an employee of Bayer, while RG, ESC, and ROV are employees of ProtoQSAR. Bayer and ProtoQSAR are owner of software evaluated in this work. The affiliations of the authors with Bayer and ProtoQSAR are disclosed for transparency and potential conflict of interest considerations. However, the competing interests declared do not affect the impartiality, integrity, or validity of the research findings presented in this manuscript.

Figures

Fig. 1
Fig. 1
Workflow for data collection and data curation of datasets. Experimental data was collected from public repositories, public databases and scientific literature (step 1) and subjected to different steps of curation. The curation workflow standardized the chemical structures and ensures elimination of incorrect, duplicated compounds (step 2) and compounds with high experimental variability (step 3) for each individual dataset, as well as the detection of inconsistent values for the same property annotated in different datasets (step 4)
Fig. 2
Fig. 2
Predictive performance of the models (regression). Average predictive performance across various validation datasets (R2) is reported with respect to 1) the entire dataset (All), 2) predictions inside the model’s applicability domain (in AD), 3) chemicals outside the model’s training set (out TS) and 4) chemicals falling simultaneously inside the AD and outside of the training set (in AD/out TS)
Fig. 3
Fig. 3
Predictive performance of the models (classification). Average predictive performance across different validation datasets (balanced accuracy, BA) is reported with respect to (1) the entire dataset (All), (2) the predictions inside the model’s applicability domain (in AD), (3) the chemicals outside the model’s training set (out TS) and (4) the chemicals falling simultaneously inside the AD and outside of the training set (in AD/out TS)
Fig. 4
Fig. 4
Average model performance grouped by endpoints and tools. R2avg and BAavg are the average R2 and BA values resulting from the single validations (i.e., prediction of a single dataset with a single model) made for a given property or with a single tool. The number of validations contributing to R2avg and BAavg and the standard deviation associated with average values are also reported. The performance of the models limited to predictions inside the model applicability domain (in AD) and outside the model training sets (out TS) are also reported when available
Fig. 5
Fig. 5
Reference chemical space with projections of the validation dataset. The reference chemical space is represented with a kernel density estimate plot, with the three probability density curves indicating the chemical space covered by drugs (blue), natural chemicals (orange) and industrial chemicals (green). The validation datasets are represented with scatter plots characterized by different colors

References

    1. Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3:711–715 - DOI - PubMed
    1. Kubinyi H (2003) Drug research: myths hype and reality. Nat Rev Drug Discov 2:665–668 - DOI - PubMed
    1. Song CM, Lim SJ, Tong JC (2009) Recent advances in computer-aided drug design. Brief Bioinform. 10.1093/bib/bbp023 - DOI - PubMed
    1. Waring MJ, Arrowsmith J, Leach AR, Leeson PD, Mandrell S, Owen RM, Pairaudeau G, Pennie WD, Pickett SD, Wang J, Wallace O, Weir A (2015) An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov 14:475–486 - DOI - PubMed
    1. Ali H, Khan E, Ilahi I (2019) Environmental chemistry and ecotoxicology of hazardous heavy metals: environmental persistence toxicity and bioaccumulation. J Chem. 10.1155/2019/6730305 - DOI

LinkOut - more resources