Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Editorial
. 2019 Apr;7(7):152.
doi: 10.21037/atm.2019.03.29.

Predictive analytics with gradient boosting in clinical medicine

Affiliations
Editorial

Predictive analytics with gradient boosting in clinical medicine

Zhongheng Zhang et al. Ann Transl Med. 2019 Apr.

Abstract

Predictive analytics play an important role in clinical research. An accurate predictive model can help clinicians stratify risk thereby allowing the identification of a target population which might benefit from a certain intervention. Conventionally, predictive analytics is performed using parametric modeling which comes with a number of assumptions. For example, generalized linear regression models require linearity and additivity to hold for the underlying data. However, these assumptions may not hold in practice. Especially in the era of big data, a large number of covariates or features can be extracted from an electronic database which might have complex interactions and higher-order terms among the covariates. Conventional modeling methods have trouble capturing such high-dimensional relationships. However, some sophisticated machine learning techniques have been invented to handle this situation. Gradient boosting is one of these techniques which is able to recursively fit a weak learner to the residual so as to improve model performance with a gradually increasing number of iterations. It can automatically discover complex data structure, including nonlinearity and high-order interactions, even in the context of hundreds, thousands, or tens-of-thousands of potential predictors. This paper aims to introduce how gradient boosting works. The principles behind this learning machine are explained with a small example in a step-by-step manner. The formal implementation of gradient tree boosting is then illustrated with the caret package. In the simulated example complexity of data structure is created by generating certain interactions between the covariates. This example shows that gradient boosting can better capture these complex relationships than a generalized linear model-based approach.

Keywords: Gradient boosting; clinical medicine; decision tree; prediction.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
The predictions with h1 and h2 on the pseudo residuals. The h1 tree is split by age and there are three terminal leaf nodes. The numbers in the nodes indicate the mean value and the percentage. The h2 tree is split on age and gender resulting in 4 leaf nodes.
Figure 2
Figure 2
The gradient boosting training process. The horizontal axis shows the number of boosting iterations (number of trees). The vertical axis shows the accuracy. The maximum tree depth takes a value of 1, 3, 5 or 7.

References

    1. Friedman JH. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 2001;29:1189-232. 10.1214/aos/1013203451 - DOI
    1. Zhao J, Feng Q, Wu P, et al. Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction. Sci Rep 2019;9:717. 10.1038/s41598-018-36745-x - DOI - PMC - PubMed
    1. Delahanty RJ, Alvarez J, Flynn LM, et al. Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis. Ann Emerg Med 2019;73:334-44. 10.1016/j.annemergmed.2018.11.036 - DOI - PubMed
    1. Wong A, Young AT, Liang AS, et al. Development and Validation of an Electronic Health Record-Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment. JAMA Netw Open 2018;1:e181018. 10.1001/jamanetworkopen.2018.1018 - DOI - PMC - PubMed
    1. Kalagara S, Eltorai AEM, Durand WM, et al. Machine learning modeling for predicting hospital readmission following lumbar laminectomy. J Neurosurg Spine 2018;30:344-52. - PubMed

Publication types