Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 11;6(1):1.
doi: 10.1186/s41512-021-00115-5.

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Affiliations

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Artuur M Leeuwenberg et al. Diagn Progn Res. .

Abstract

Background: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate.

Methods: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations.

Results: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R2, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout.

Conclusions: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

Keywords: Multi-collinearity; Normal-tissue complication probability models; Prediction models.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Predictive performance results for the xerostomia simulations. Lowess-smoothed calibration curves per simulation are plotted in grey. The calibration curve over all repetitions is shown in blue. Perfect calibration, the diagonal, is dashed in red
Fig. 2
Fig. 2
Predictive performance results for the dysphagia simulations. Lowess-smoothed calibration curves per simulation are plotted in grey. The calibration curve over all repetitions is shown in blue. Perfect calibration, the diagonal, is dashed in red
Fig. 3
Fig. 3
Across models, the log mean absolute error between the estimated and the true coefficients for each method, for the xerostomia settings. Red indicates high collinearity, and blue low collinearity.
Fig. 4
Fig. 4
Across models, the log mean absolute error between the estimated and the true coefficients for each method, for the dysphagia settings. Red indicates high collinearity, and blue low collinearity
Fig. 5
Fig. 5
Across models, the mean proportion of coefficients with the same direction of effect after repetition for the xerostomia settings. Red indicates high collinearity, and blue low collinearity
Fig. 6
Fig. 6
Across models, the mean proportion of coefficients with the same direction of effect after repetition for the dysphagia settings. Red indicates high collinearity, and blue low collinearity
Fig. 7
Fig. 7
Hyperparameter values for xerostomia: per predictor set, setting A being the small predictor set with high EPV (EPV = 23), and setting B the large predictor set with lower EPV (EPV = 8). The high collinearity settings in red, and the low collinearity setting in blue. The methods are distributed across three plots due to their different scales. Hyperparameter notation follows Table 2, except for λENet , which is the total shrinkage factor for ElasticNet (λℓ1ℓ2 )
Fig. 8
Fig. 8
Hyperparameter values for xerostomia: per predictor set, setting C being the small predictor set with high EPV (EPV = 6), and setting D the large predictor set with lower EPV (EPV = 2). The high collinearity settings in red, and the low collinearity setting in blue. The methods are distributed across three plots due to their different scales. Hyperparameter notation follows Table 2, except for λENet , which is the total shrinkage factor for ElasticNet (λℓ1 + λℓ2)

References

    1. Teipel SJ, Kurth J, Krause B, Grothe MJ, Initiative ADN, et al. The relative importance of imaging markers for the prediction of alzheimer’s disease dementia in mild cognitive impairment—beyond classical regression. NeuroImage: Clinical. 2015;8:583–593. doi: 10.1016/j.nicl.2015.05.006. - DOI - PMC - PubMed
    1. Westerhuis ME, Schuit E, Kwee A, Zuithoff NP, Groenwold RH, Van Den Akker ES, Van Beek E, Van Dessel HJ, Drogtrop AP, Van Geijn HP, et al. Prediction of neonatal metabolic acidosis in women with a singleton term pregnancy in cephalic presentation. American Journal of Perinatology. 2012;29(03):167–174. doi: 10.1055/s-0031-1284226. - DOI - PubMed
    1. Narchi H, AlBlooshi A. Prediction equations of forced oscillation technique: the insidious role of collinearity. Respiratory research. 2018;19(1):48. doi: 10.1186/s12931-018-0745-8. - DOI - PMC - PubMed
    1. Van den Bosch L, Schuit E, van der Laan HP, Reitsma JB, Moons KG, Steenbakkers RJ, Hoebers FJ, Langendijk JA, van der Schaaf A. Key challenges in normal tissue complication probability model development and validation: towards a comprehensive strategy. Radiotherapy and Oncology. 2020;148:151–156. doi: 10.1016/j.radonc.2020.04.012. - DOI - PubMed
    1. Van Der Schaaf A, Van den Bosch L, Both S, Schuit E, Langendijk J. EP-1914 a method to deal with highly correlated explanatory variables in the development of NTCP models. Radiotherapy and Oncology. 2019;133:1040. doi: 10.1016/S0167-8140(19)32334-5. - DOI

LinkOut - more resources