Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar;23(2):211-223.
doi: 10.1007/s10198-021-01363-4. Epub 2021 Aug 9.

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Affiliations

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Alexandre Vimont et al. Eur J Health Econ. 2022 Mar.

Abstract

Background: Innovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.

Methods: A 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.

Results: We included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.

Conclusions: RF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.

Keywords: Cost containment; Healthcare costs; Healthcare management; Machine learning; Neural network; Predictive analytics; Random forest.

PubMed Disclaimer

References

    1. WHO: Health Systems Financing: The Path to Universal Coverage. WHO, Geneva (2010)
    1. Ellis, R.P., Martins, B., Zhu, W.: Demand elasticities and service selection incentives among competing private health plans. J. Health Econ. 56, 352–367 (2017). https://doi.org/10.1016/j.jhealeco.2017.09.006 - DOI - PubMed
    1. OECD: Fiscal Sustainability of Health Systems; Bridging Health and Finance Perspectives. OECD Publishing, Paris (2015) - DOI
    1. OECD: Better Ways to Pay for Health Care. OECD Health Policy Studies. OECD Publishing, Paris (2016) - DOI
    1. Newhart, P.K.I.F.S.: Evaluation of the CMS-HCC Risk Adjustment Model. Technical report, RTI International and the Centers for Medicare & Medicaid Services (2011)

LinkOut - more resources