Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 22;25(1):bbad511.
doi: 10.1093/bib/bbad511.

CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting

Affiliations

CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting

Qiang Li et al. Brief Bioinform. .

Abstract

In the drug development process, approximately 30% of failures are attributed to drug safety issues. In particular, the first-in-human (FIH) trial of a new drug represents one of the highest safety risks, and initial dose selection is crucial for ensuring safety in clinical trials. With traditional dose estimation methods, which extrapolate data from animals to humans, catastrophic events have occurred during Phase I clinical trials due to interspecies differences in compound sensitivity and unknown molecular mechanisms. To address this issue, this study proposes a CrossFuse-extreme gradient boosting (XGBoost) method that can directly predict the maximum recommended daily dose of a compound based on existing human research data, providing a reference for FIH dose selection. This method not only integrates multiple features, including molecular representations, physicochemical properties and compound-protein interactions, but also improves feature selection based on cross-validation. The results demonstrate that the CrossFuse-XGBoost method not only improves prediction accuracy compared to that of existing local weighted methods [k-nearest neighbor (k-NN) and variable k-NN (v-NN)] but also solves the low prediction coverage issue of v-NN, achieving full coverage of the external validation set and enabling more reliable predictions. Furthermore, this study offers a high level of interpretability by identifying the importance of different features in model construction. The 241 features with the most significant impact on the maximum recommended daily dose were selected, providing references for optimizing the structure of new compounds and guiding experimental research. The datasets and source code are freely available at https://github.com/cqmu-lq/CrossFuse-XGBoost.

Keywords: CrossFuse-XGBoost; cross-validation screening; maximum recommended daily dose; multi-feature fusion.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis flow of CrossFuse-XGBoost.
Figure 2
Figure 2
Data transformation. (A) and (B) are the histograms of the original data. (C) Boxplot of pretransformed data in units of mmol/kg-bw/day. (D), (E) and (F) are the corresponding transformed data.
Figure 3
Figure 3
XGBoost model tuning results. (A) The process of 40-fold cross-validation. Each dot in (B) represents a validation result. (C) The x-axis is the real value, the y-axis is the predicted value and the regression line shows that they have a good correlation. The x-axis of (D) is the structural similarity, and the left y-axis is the correlation between the predicted value and the real value. The right y-axis represents the amount of data. This shows that the more similar the structure is, the more accurate the prediction result.
Figure 4
Figure 4
(A) AUC for the classification model. (B) Results for the external validation set.
Figure 5
Figure 5
Feature analysis. (A) and (B) select the top 30 selected features with the highest sum of feature importance, the upper histogram of (A) is the sum of feature importance, and the lower histogram is the absolute value of the correlation between the feature and the real value. (B) Correlation among the top 30 features.

References

    1. Chan HS, Shan H, Dahoun T, et al. . Advancing drug discovery via artificial intelligence. Trends Pharmacol Sci 2019;40:592–604. - PubMed
    1. Wouters OJ, McKee M, Luyten J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 2020;323:844–53. - PMC - PubMed
    1. Giri S, Bader A. A low-cost, high-quality new drug discovery process using patient-derived induced pluripotent stem cells. Drug Discov Today 2015;20:37–49. - PubMed
    1. Maurer TS, Smith D, Beaumont K, di L. Dose predictions for drug design. J Med Chem 2020;63:6423–35. - PubMed
    1. Lee SM, Wages NA, Goodman KA, Lockhart AC. Designing dose-finding phase I clinical trials: top 10 questions that should be discussed with your statistician. JCO Precis Oncol 2021;5:317–24. - PMC - PubMed

Publication types