Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Oct 1:15:100439.
doi: 10.1016/j.onehlt.2022.100439. eCollection 2022 Dec.

Predicting infectious disease for biopreparedness and response: A systematic review of machine learning and deep learning approaches

Affiliations
Review

Predicting infectious disease for biopreparedness and response: A systematic review of machine learning and deep learning approaches

Ravikiran Keshavamurthy et al. One Health. .

Abstract

The complex, unpredictable nature of pathogen occurrence has required substantial efforts to accurately predict infectious diseases (IDs). With rising popularity of Machine Learning (ML) and Deep Learning (DL) techniques combined with their unique ability to uncover connections between large amounts of diverse data, we conducted a PRISMA systematic review to investigate advances in ID prediction for human and animal diseases using ML and DL. This review included the type of IDs modeled, ML and DL techniques utilized, geographical distribution, prediction tasks performed, input features utilized, spatial and temporal scales, error metrics used, computational efficiency, uncertainty quantification, and missing data handling methods. Among 237 relevant articles published between January 2001 and May 2021, highly contagious diseases in humans were most often represented, including COVID-19 (37.1%), influenza/influenza-like illnesses (9.3%), dengue (8.9%), and malaria (5.1%). Out of 37 diseases identified, 51.4% were zoonotic, 37.8% were human-only, and 8.1% were animal-only, with only 1.6% economically significant, non-zoonotic livestock diseases. Despite the number of zoonoses, 86.5% of articles modeled humans whereas only a few articles (5.1%) contained more than one host species. Eastern Asia (32.5%), North America (17.7%), and Southern Asia (13.1%) were the most represented locations. Frequent approaches included tree-based ML (38.4%) and feed-forward neural networks (26.6%). Articles predicted temporal incidence (66.7%), disease risk (38.0%), and/or spatial movement (31.2%). Less than 10% of studies addressed uncertainty quantification, computational efficiency, and missing data, which are essential to operational use and deployment. This study highlights trends and gaps in ML and DL for ID prediction, providing guidelines for future works to better support biopreparedness and response. To fully utilize ML and DL for improved ID forecasting, models should include the full disease ecology in a One-Health context, important food and agricultural diseases, underrepresented hotspots, and important metrics required for operational deployment.

Keywords: Deep learning; Disease forecast; Disease prediction; Infectious diseases; Machine learning; Systematic review.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
PRISMA flow diagram. The illustration of the overall selection process. * Google Scholar searches were restricted to the first 300 results.
Fig. 2
Fig. 2
Venn diagram of articles grouped by host species included in infectious disease modeling using machine learning and deep learning techniques. Domesticated animals include livestock and companion animals; wildlife includes wild animals and birds.
Fig. 3
Fig. 3
Distribution of articles with infectious disease models built for each geographical region. If an article included infectious disease models for more than four regions, they were placed in “multiple regions” category. Similarly, if an article included models for multiple diseases, they were placed in each respective disease category.
Fig. 4
Fig. 4
Trend and extent of ID prediction models published (January 2001–May 2021): Number of citations placed by a) model types (i.e., ML or DL) b) DL model subtypes c) ML model subtypes d) Tree-based ML model subtypes. Note: if an article contained models from different types or subtypes, it was placed in each respective group.
Fig. 5
Fig. 5
Model prediction categories. The distribution of disease prediction models grouped by model categories and diseases. If an article contained models that performed multiple prediction tasks and for multiple diseases, it was placed in each respective group.
Fig. 6
Fig. 6
Spatial and temporal scale of ID prediction models. a) Proportion of the spatial scale (geographic extent) of the models grouped by model categories b) Proportion of temporal scale (duration) of the models grouped by model categories c) Among temporal prediction models, proportion of forecasting distance grouped by temporal scale. An article was placed in its respective groups if it utilized ID models with multiple model categories, spatial and/or temporal scales.
Fig. 7
Fig. 7
Characteristics of input feature groups utilized for disease prediction. Articles (n = 237) categorized by a) input feature groups used by disease type b) number of input feature groups utilized by ID prediction model categories. If an article utilized multiple input features, modeled multiple diseases and/or belonged to multiple model categories, the article was counted within each respective grouping.
Fig. 8
Fig. 8
Error metrics utilized in ID prediction models: Citations grouped by a) Classification error metrics and b) Regression error metrics. If an article used error metrics from different classes, it was placed in each respective group. Abbreviations: AUC-ROC (Area Under the Curve - Receiver Operating Characteristic curve), AIC/BIC (Akaike's/Bayesiasn Information Criteria, corr coeff. (Correlation coefficient), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean squared error), RMSE (Root Mean Square Error).

References

    1. Feldmann H., Czub M., Jones S., Dick D., Garbutt M., Grolla A., Artsob H. Emerging and re-emerging infectious diseases. Med. Microbiol. Immunol. 2002;191:63–74. doi: 10.1007/S00430-002-0122-5. - DOI - PubMed
    1. Woolhouse M. How to make predictions about future infectious disease risks. Philos. Trans. Royal Soc. B: Biol. Sci. 2011;366:2045–2054. doi: 10.1098/RSTB.2010.0387. - DOI - PMC - PubMed
    1. Heesterbeek H., Anderson R.M., Andreasen V., Bansal S., DeAngelis D., Dye C., Eames K.T.D., Edmunds W.J., Frost S.D.W., Funk S., Hollingsworth T.D., House T., Isham V., Klepac P., Lessler J., Lloyd-Smith J.O., Metcalf C.J.E., Mollison D., Pellis L., Pulliam J.R.C., Roberts M.G., Viboud C., Arinaminpathy N., Ball F., Bogich T., Gog J., Grenfell B., Lloyd A.L., Mclean A., O’Neill P., Pearson C., Riley S., Tomba G.S., Trapman P., Wood J. Modeling infectious disease dynamics in the complex landscape of global health. Science. 1979;347(2015) doi: 10.1126/SCIENCE.AAA4339/ASSET/8FA31E42-DA90-4C84-BF84-FBB2DA09DB83/ASSETS/GRAPHIC/347_AAA4339_F2.JPEG. - DOI - PMC - PubMed
    1. Charles-Smith L.E., Reynolds T.L., Cameron M.A., Conway M., Lau E.H.Y., Olsen J.M., Pavlin J.A., Shigematsu M., Streichert L.C., Suda K.J., Corley C.D. Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PLoS One. 2015;10 doi: 10.1371/JOURNAL.PONE.0139701. - DOI - PMC - PubMed
    1. Keshavamurthy R., Thumbi S.M., Charles L.E. Digital biosurveillance for zoonotic disease detection in Kenya. Pathogens. 2021;10:783. doi: 10.3390/PATHOGENS10070783. 10 (2021) 783. - DOI - PMC - PubMed

LinkOut - more resources