Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;21(11):1533-1538.e6.
doi: 10.1016/j.jamda.2020.08.030. Epub 2020 Aug 27.

Predicting Coronavirus Disease 2019 Infection Risk and Related Risk Drivers in Nursing Homes: A Machine Learning Approach

Affiliations

Predicting Coronavirus Disease 2019 Infection Risk and Related Risk Drivers in Nursing Homes: A Machine Learning Approach

Christopher L F Sun et al. J Am Med Dir Assoc. 2020 Nov.

Abstract

Objective: Inform coronavirus disease 2019 (COVID-19) infection prevention measures by identifying and assessing risk and possible vectors of infection in nursing homes (NHs) using a machine-learning approach.

Design: This retrospective cohort study used a gradient boosting algorithm to evaluate risk of COVID-19 infection (ie, presence of at least 1 confirmed COVID-19 resident) in NHs.

Setting and participants: The model was trained on outcomes from 1146 NHs in Massachusetts, Georgia, and New Jersey, reporting COVID-19 case data on April 20, 2020. Risk indices generated from the model using data from May 4 were prospectively validated against outcomes reported on May 11 from 1021 NHs in California.

Methods: Model features, pertaining to facility and community characteristics, were obtained from a self-constructed dataset based on multiple public and private sources. The model was assessed via out-of-sample area under the receiver operating characteristic curve (AUC), sensitivity, and specificity in the training (via 10-fold cross-validation) and validation datasets.

Results: The mean AUC, sensitivity, and specificity of the model over 10-fold cross-validation were 0.729 [95% confidence interval (CI) 0.690‒0.767], 0.670 (95% CI 0.477‒0.862), and 0.611 (95% CI 0.412‒0.809), respectively. Prospective out-of-sample validation yielded similar performance measures (AUC 0.721; sensitivity 0.622; specificity 0.713). The strongest predictors of COVID-19 infection were identified as the NH's county's infection rate and the number of separate units in the NH; other predictors included the county's population density, historical Centers of Medicare and Medicaid Services cited health deficiencies, and the NH's resident density (in persons per 1000 square feet). In addition, the NH's historical percentage of non-Hispanic white residents was identified as a protective factor.

Conclusions and implications: A machine-learning model can help quantify and predict NH infection risk. The identified risk factors support the early identification and management of presymptomatic and asymptomatic individuals (eg, staff) entering the NH from the surrounding community and the development of financially sustainable staff testing initiatives in preventing COVID-19 infection.

Keywords: COVID-19; Nursing homes; health policy; infection prevention; long-term care facility; machine-learning; risk modeling.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Feature importance and impact on risk of COVID-19 infection in NHs from the gradient boosting model. The NH's county's COVID-19 infection rate and size had the largest impact on infection risk (features are in descending order from highest to lowest importance). In the figure, each dot represents a NH that the model has been trained on. For each NH, a high feature value corresponds to the color red, and a low feature value corresponds to the color blue. The horizontal axis shows whether the effect of the feature value is associated with a higher or lower risk of NG infection.
Supplementary Fig. 1
Supplementary Fig. 1
Predictive feature's impact , shown in subfigures (A‒G), on estimated NH risk of COVID-19 infection. The median (blue line), 25th and 75th percentiles (gray band), and 5th and 95th percentiles (orange band) of the infection risk levels generated by the trained model are shown across 15,300 NHs in the United States.

References

    1. One-Third of All U.S. Coronavirus Deaths Are Nursing Home Residents or Workers. https://www.nytimes.com/interactive/2020/05/09/us/coronavirus-cases-nurs... Available at:
    1. Barnett M.L., Grabowski D.C. Nursing homes are ground zero for COVID-19 pandemic. JAMA Health Forum. 2020;1:e200369. - PubMed
    1. Grabowski D.C., Mor V. Nursing home care in crisis in the wake of COVID-19. JAMA. 2020;324:23–24. - PubMed
    1. McMichael T.M., Currie D.W., Clark S. Epidemiology of Covid-19 in a long-term care facility in King County, Washington. N Engl J Med. 2020;382:2005–2011. - PMC - PubMed
    1. Arons M.M., Hatfield K.M., Reddy S.C. Presymptomatic SARS-CoV-2 Infections and Transmission in a Skilled Nursing Facility. N Engl J Med. 2020;382:2005–2011. - PMC - PubMed