Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Alexandre Vimont^{1

2}, Henri Leleu³, Isabelle Durand-Zaleski⁴

Affiliations

¹ Public Health Expertise (PHE), Paris, France. alexandre.vimont@ph-expertise.com.
² Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France. alexandre.vimont@ph-expertise.com.
³ Public Health Expertise (PHE), Paris, France.
⁴ Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France.

PMID: 34373958
DOI: 10.1007/s10198-021-01363-4

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Alexandre Vimont et al. Eur J Health Econ. 2022 Mar.

. 2022 Mar;23(2):211-223.

doi: 10.1007/s10198-021-01363-4. Epub 2021 Aug 9.

Authors

Alexandre Vimont^{1

2}, Henri Leleu³, Isabelle Durand-Zaleski⁴

Affiliations

¹ Public Health Expertise (PHE), Paris, France. alexandre.vimont@ph-expertise.com.
² Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France. alexandre.vimont@ph-expertise.com.
³ Public Health Expertise (PHE), Paris, France.
⁴ Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France.

PMID: 34373958
DOI: 10.1007/s10198-021-01363-4

Abstract

Background: Innovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.

Methods: A 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R²), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.

Results: We included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R² of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R² of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.

Conclusions: RF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.

Keywords: Cost containment; Healthcare costs; Healthcare management; Machine learning; Neural network; Predictive analytics; Random forest.

PubMed Disclaimer

References

1. WHO: Health Systems Financing: The Path to Universal Coverage. WHO, Geneva (2010)
1. Ellis, R.P., Martins, B., Zhu, W.: Demand elasticities and service selection incentives among competing private health plans. J. Health Econ. 56, 352–367 (2017). https://doi.org/10.1016/j.jhealeco.2017.09.006 - DOI - PubMed
1. OECD: Fiscal Sustainability of Health Systems; Bridging Health and Finance Perspectives. OECD Publishing, Paris (2015) - DOI
1. OECD: Better Ways to Pay for Health Care. OECD Health Policy Studies. OECD Publishing, Paris (2016) - DOI
1. Newhart, P.K.I.F.S.: Evaluation of the CMS-HCC Risk Adjustment Model. Technical report, RTI International and the Centers for Medicare & Medicaid Services (2011)

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Affiliations

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Authors

Affiliations

Abstract

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous