Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 5:13:1092467.
doi: 10.3389/fmicb.2022.1092467. eCollection 2022.

LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions

Affiliations

LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions

Yu Peng et al. Front Microbiol. .

Abstract

Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed.

Keywords: LightGBM; deep forest; drug-target interactions; machine learning; prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The pipeline of LGBMDF. After getting the features of drugs and targets, we process these features with cascade forest, and set 3 LightGBMs and 3 ExtraTrees for each level as estimators, each estimator outputs a 2-dimensional class vector, and then concatenate the output class vector and the original feature vector as the input vector for the next layer.
Figure 2
Figure 2
The construction of histogram.
Figure 3
Figure 3
Subtract the histogram of sibling node from the histogram of the parent node so that the speed can be doubled.
Figure 4
Figure 4
Comparison of tree growth patterns between XGBoost and LightGBM. (A) XGBoost uses the level-wise growth strategy, which can split the leaves of the same level at the same time by traversing the data once. (B) LightGBM uses the leaf-wise growth strategy, which finds the leaf with the largest splitting gain from all the current leaves, and then splits it.
Figure 5
Figure 5
Bind mutually exclusive features into a single feature.
Figure 6
Figure 6
Model performance comparison under each estimator setting. (A) AUC and AUPR for 4 estimator combinations. (B) Computational time for 4 estimator combinations.
Figure 7
Figure 7
Sn, Sp, MCC, AUC and AUPR of LGBMDF, AOPEDF, NEDTP, RF, SVM.

Similar articles

Cited by

References

    1. Al Daoud E. (2019). Comparison between XGBoost, light GBM and cat boost using a home credit dataset. Int. J. Comput. Inf. Eng. 13, 6–10. doi: 10.5281/zenodo.3607805 - DOI
    1. An Q., Yu L. (2021). A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Brief. Bioinform. 22:bbab275. doi: 10.1093/bib/bbab275, PMID: - DOI - PubMed
    1. Apweiler R., Bairoch A., Wu C. H., Barker W. C., Boeckmann B., Ferro S., et al. . (2004). Uni Prot: the universal protein knowledgebase. Nucleic Acids Res. 32, 115D–1119D. doi: 10.1093/nar/gkh131, PMID: - DOI - PMC - PubMed
    1. Bagherian M., Kim R. B., Jiang C., Sartor M. A., Derksen H., Najarian K. (2021). Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions. Brief. Bioinform. 22, 2161–2171. doi: 10.1093/bib/bbaa025, PMID: - DOI - PMC - PubMed
    1. Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32. doi: 10.1023/A:1010933404324 - DOI

LinkOut - more resources