. 2019 Dec;7(23):796.

doi: 10.21037/atm.2019.08.63.

In-depth mining of clinical data: the construction of clinical prediction model with R

Zhi-Rui Zhou¹, Wei-Wei Wang², Yan Li³, Kai-Rui Jin⁴, Xuan-Yi Wang⁴, Zi-Wei Wang⁵, Yi-Shan Chen⁶, Shao-Jia Wang⁷, Jing Hu⁶, Hui-Na Zhang⁶, Po Huang⁶, Guo-Zhen Zhao⁶, Xing-Xing Chen⁴, Bo Li⁶, Tian-Song Zhang⁸

Affiliations

¹ Department of Radiotherapy, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai 200040, China.
² Department of Thoracic Surgery, The Third Affiliated Hospital of Kunming Medical University & Yunnan Provincial Tumor Hospital, Kunming 650118, China.
³ Department of Anesthesiology, The Fourth Affiliated Hospital, Harbin Medical University, Harbin 150001, China.
⁴ Department of Radiation Oncology, Shanghai Cancer Center, Shanghai Medical College, Fudan University, Shanghai 200040, China.
⁵ Department of Urology, Changhai Hospital, The Second Military Medical University, Shanghai 200040, China.
⁶ Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing Institute of Traditional Chinese Medicine, Beijing 100010, China.
⁷ Department of Gynecologic Oncology, The Third Affiliated Hospital of Kunming Medical University & Yunnan Provincial Tumor Hospital, Kunming 650118, China.
⁸ Internal Medicine of Traditional Chinese Medicine Department, Jing'an District Central Hospital, Fudan University, Shanghai 200040, China.

PMID: 32042812
PMCID: PMC6989986
DOI: 10.21037/atm.2019.08.63

In-depth mining of clinical data: the construction of clinical prediction model with R

Zhi-Rui Zhou et al. Ann Transl Med. 2019 Dec.

. 2019 Dec;7(23):796.

doi: 10.21037/atm.2019.08.63.

Authors

Affiliations

¹ Department of Radiotherapy, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai 200040, China.
² Department of Thoracic Surgery, The Third Affiliated Hospital of Kunming Medical University & Yunnan Provincial Tumor Hospital, Kunming 650118, China.
³ Department of Anesthesiology, The Fourth Affiliated Hospital, Harbin Medical University, Harbin 150001, China.
⁴ Department of Radiation Oncology, Shanghai Cancer Center, Shanghai Medical College, Fudan University, Shanghai 200040, China.
⁵ Department of Urology, Changhai Hospital, The Second Military Medical University, Shanghai 200040, China.
⁶ Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing Institute of Traditional Chinese Medicine, Beijing 100010, China.
⁷ Department of Gynecologic Oncology, The Third Affiliated Hospital of Kunming Medical University & Yunnan Provincial Tumor Hospital, Kunming 650118, China.
⁸ Internal Medicine of Traditional Chinese Medicine Department, Jing'an District Central Hospital, Fudan University, Shanghai 200040, China.

PMID: 32042812
PMCID: PMC6989986
DOI: 10.21037/atm.2019.08.63

Abstract

This article is the series of methodology of clinical prediction model construction (total 16 sections of this methodology series). The first section mainly introduces the concept, current application status, construction methods and processes, classification of clinical prediction models, and the necessary conditions for conducting such researches and the problems currently faced. The second episode of these series mainly concentrates on the screening method in multivariate regression analysis. The third section mainly introduces the construction method of prediction models based on Logistic regression and Nomogram drawing. The fourth episode mainly concentrates on Cox proportional hazards regression model and Nomogram drawing. The fifth Section of the series mainly introduces the calculation method of C-Statistics in the logistic regression model. The sixth section mainly introduces two common calculation methods for C-Index in Cox regression based on R. The seventh section focuses on the principle and calculation methods of Net Reclassification Index (NRI) using R. The eighth section focuses on the principle and calculation methods of IDI (Integrated Discrimination Index) using R. The ninth section continues to explore the evaluation method of clinical utility after predictive model construction: Decision Curve Analysis. The tenth section is a supplement to the previous section and mainly introduces the Decision Curve Analysis of survival outcome data. The eleventh section mainly discusses the external validation method of Logistic regression model. The twelfth mainly discusses the in-depth evaluation of Cox regression model based on R, including calculating the concordance index of discrimination (C-index) in the validation data set and drawing the calibration curve. The thirteenth section mainly introduces how to deal with the survival data outcome using competitive risk model with R. The fourteenth section mainly introduces how to draw the nomogram of the competitive risk model with R. The fifteenth section of the series mainly discusses the identification of outliers and the interpolation of missing values. The sixteenth section of the series mainly introduced the advanced variable selection methods in linear model, such as Ridge regression and LASSO regression.

Keywords: Clinical prediction models; R; statistical computing.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors have no conflicts of interest to declare.

Figures

**Figure 1**
The flow chart of construction and evaluation of clinical prediction models.

**Figure 2**
Research process and technical routes of three prediction models.

**Figure 3**
Nomogram based on model “fit1”.

**Figure 4**
Calibration curve based on model “fit1”.

**Figure 5**
Nomogram based on model “fit2”.

**Figure 6**
Calibration curve based on model “fit2”.

**Figure 7**
Nomogram based on model “fit”.

**Figure 8**
Calibration curve based on model “fit”.

**Figure 9**
Nomogram of Cox regression model.

**Figure 10**
Calibration curve of Cox model.

**Figure 11**
Nomogram based on median survival time of Cox regression.

**Figure 12**
Nomogram based on survival probality of Cox regression model.

**Figure 13**
Calibration curve based on Cox regression model.

**Figure 15**
The comparison of two models.

**Figure 17**
Clinical impact curve of simple model.

**Figure 18**
Clinical impact curve of complex model.

**Figure 19**
DCA of survival outcome data.

**Figure 20**
DCA curve of “coxmod” based on Cox regression model.

**Figure 21**
DCA curves of “coxmod1” and “coxmod1” based on two Cox regression models.

**Figure 22**
DCA curve of a single predictor “thickness” based on univariate Cox regression model.

**Figure 23**
DCA curve of a single predictor “thickness” based on univariate Cox regression model. Y axis represent net reduction in interventions per 100 persons.

**Figure 26**
ROC curve in validation set.

**Figure 27**
The discrimination index of Cox (2 variables) compared with Cox (5 variables) without cross-validation.

**Figure 28**
The discriminability of Cox (2 variables) compared with Cox (5 variables) with cross-validation.

**Figure 29**
The Calibration Plot performed by pec package.

**Figure 30**
The Calibration Plot performed by pec package with cross-validation.

**Figure 31**
The survival curve of cumulative recurrence rate and cumulative competitive risk event incidence rate.

**Figure 32**
Nomogram predicting cumulative recurrence risk at 36 and 60 months using the competitive risk model. Nomogram estimates that patient no. 31 has a cumulative risk of recurrence of 0.196 and 0.213 at 36 and 60 months, respectively. *, P<0.05; ***, P<0.001.

**Figure 33**
Nomogram predicting cumulative risk of recurrence at 36 and 60 months using Cox proportional hazard model. According to Nomogram’s estimate, the cumulative risk of recurrence in patient no. 31 at 36 and 60 months is 0.205 and 0.217, respectively. *, P<0.05; ***, P<0.001.

**Figure 34**
Visualization of missing values (1).

**Figure 35**
Visualization of missing values (2).

**Figure 36**
Distribution of missing values with averages.

**Figure 37**
The relationship between the coefficient and the Log(λ).

**Figure 38**
The performance of this model on the test set.

**Figure 39**
The relationship between the coefficient and the Log(λ).

**Figure 40**
The performance of this model on the test set.

**Figure 41**
Relationship between AUC and Log(λ).

**Figure 42**
The performance of this model on the test set.

**Figure 43**
The relationship between the coefficient and the L1 norm.

**Figure 44**
The relationship between the coefficient and the Log(λ).

**Figure 45**
The relationship between the coefficient and the fraction deviance explained.

**Figure 46**
The relationship between predicted and actual values in the ridge regression.

**Figure 47**
The relationship between the coefficient and the Log(λ) in the Lasso regression.

**Figure 48**
The relationship between predicted and actual values in the LASSO regression.

**Figure 49**
The relationship between the logarithm of λ and the mean square error in the LASSO regression.

See this image and copyright information in PMC

Comment in

Predictive analytics in the era of big data: opportunities and challenges.
Zhang Z. Zhang Z. Ann Transl Med. 2020 Feb;8(4):68. doi: 10.21037/atm.2019.10.97. Ann Transl Med. 2020. PMID: 32175361 Free PMC article. No abstract available.
Real-life clinical data mining: generating hypotheses for evidence-based medicine.
Bibault JE. Bibault JE. Ann Transl Med. 2020 Feb;8(4):69. doi: 10.21037/atm.2019.10.99. Ann Transl Med. 2020. PMID: 32175362 Free PMC article. No abstract available.
Joint forces for making clinical prediction models contribute to science.
Wang J, Li Y. Wang J, et al. Ann Transl Med. 2020 Feb;8(4):70. doi: 10.21037/atm.2019.11.10. Ann Transl Med. 2020. PMID: 32175363 Free PMC article. No abstract available.
Overview of clinical prediction models.
Chen L. Chen L. Ann Transl Med. 2020 Feb;8(4):71. doi: 10.21037/atm.2019.11.121. Ann Transl Med. 2020. PMID: 32175364 Free PMC article. No abstract available.
Clinical prediction models: evaluation matters.
Gu HQ, Liu C. Gu HQ, et al. Ann Transl Med. 2020 Feb;8(4):72. doi: 10.21037/atm.2019.11.143. Ann Transl Med. 2020. PMID: 32175365 Free PMC article. No abstract available.
Statistical methods and models in the analysis of time to event data.
Lee M, Han J. Lee M, et al. Ann Transl Med. 2020 Feb;8(4):73. doi: 10.21037/atm.2019.12.66. Ann Transl Med. 2020. PMID: 32175366 Free PMC article. No abstract available.
Models and prediction, how and what?
Xie Y, Yu Z. Xie Y, et al. Ann Transl Med. 2020 Feb;8(4):75. doi: 10.21037/atm.2019.12.133. Ann Transl Med. 2020. PMID: 32175368 Free PMC article. No abstract available.
How to use statistical models and methods for clinical prediction.
Cortese G. Cortese G. Ann Transl Med. 2020 Feb;8(4):76. doi: 10.21037/atm.2020.01.22. Ann Transl Med. 2020. PMID: 32175369 Free PMC article. No abstract available.
The power of clinical data empowered by clinical prediction model: an R tutorial.
Dai L, Yang D, Shen H. Dai L, et al. Ann Transl Med. 2020 Feb;8(4):77. doi: 10.21037/atm.2020.01.114. Ann Transl Med. 2020. PMID: 32175370 Free PMC article. No abstract available.
A nomogram with enhanced function facilitated by nomogramEx and nomogramFormula.
Bi G, Li R, Liang J, Hu Z, Zhan C. Bi G, et al. Ann Transl Med. 2020 Feb;8(4):78. doi: 10.21037/atm.2020.01.71. Ann Transl Med. 2020. PMID: 32175371 Free PMC article. No abstract available.
Clinical prediction models in the precision medicine era: old and new algorithms.
Luo JC, Zhao QY, Tu GW. Luo JC, et al. Ann Transl Med. 2020 Mar;8(6):274. doi: 10.21037/atm.2020.02.63. Ann Transl Med. 2020. PMID: 32355718 Free PMC article. No abstract available.
Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R.
Liang J, Bi G, Zhan C. Liang J, et al. Ann Transl Med. 2020 Aug;8(16):982. doi: 10.21037/atm-2020-57. Ann Transl Med. 2020. PMID: 32953782 Free PMC article. No abstract available.

References

1. Reza Soroushmehr SM, Najarian K. Transforming big data into computational models for personalized medicine and health care. Dialogues Clin Neurosci 2016;18:339-43. - PMC - PubMed
1. Bibault JE, Giraud P, Burgun A. Big Data and machine learning in radiation oncology: State of the art and future prospects. Cancer Lett 2016;382:110-7. 10.1016/j.canlet.2016.05.033 - DOI - PubMed
1. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. The TRIPOD Group. Circulation 2015;131:211-9. 10.1161/CIRCULATIONAHA.114.014508 - DOI - PMC - PubMed
1. Adams ST, Leveson SH. Clinical prediction rules. Bmj 2012;344:d8312. 10.1136/bmj.d8312 - DOI - PubMed
1. Ranstam J, Cook JA, Collins GS. Clinical prediction models. Br J Surg 2016;103:1886. 10.1002/bjs.10242 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

In-depth mining of clinical data: the construction of clinical prediction model with R

Affiliations

In-depth mining of clinical data: the construction of clinical prediction model with R

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical