Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jun 25:18:1704-1721.
doi: 10.1016/j.csbj.2020.06.031. eCollection 2020.

Assessment of vector-host-pathogen relationships using data mining and machine learning

Affiliations
Review

Assessment of vector-host-pathogen relationships using data mining and machine learning

Diing D M Agany et al. Comput Struct Biotechnol J. .

Abstract

Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.

Keywords: Adaptation; Association Mining; Big Data; Data Mining; Host-Pathogen; Infectious Disease; Interaction; Machine Learning; OMICs; Pathogenicity; Systems Bioscience; Transmission; Vector-Borne Disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Overview of Systems Bioscience (a) of vector-host-pathogen relationships (b) of Data Mining and Machine Learning processes (c) emphasizing the information flow and interwinding nature of the subject matter in relationship to tools used in the review papers.
Fig. 2
Fig. 2
Search workflow (PRISMA) used in article searching, retrieving, processing and inclusion/exclusion decision making.
Fig. 3
Fig. 3
Paper count per year showing (a) a trend increase in MD & ML application in the study of vector-host-pathogen relationships and (b) distribution across research questions and applications of DM & ML.
Fig. 4
Fig. 4
Data analysis results overview (n = 32). Fig. 4 was created with word-cloud by using (a) papers key words, (b) words contained in study objectives (c) scope in systems bioscience covered by the paper (d) and sources of the data in a study, such as databases, lab experiments, or simulations. The visibility of a word among words in their panel emphasize their appearance in a papers' keys words, objective, or scope, and highlight a review papers' focus.
Fig. 5
Fig. 5
Snapshot of data science perspective on host-pathogen interaction analysis – from raw data to Knowledge Discovery in Databases (KDD), the Data Mining (DM) & Machine Learning (ML) process: (a) data source, (b) dataset annotation, (c) dataset quality using FAIR principle and (d) ML method used.

References

    1. Bueno-Marí R., Jiménez-Peydró R. Global change and human vulnerability to vector-borne diseases. Front Physiol. 2013;4:158. doi: 10.3389/fphys.2013.00158. - DOI - PMC - PubMed
    1. World Health Organization A global brief on vector-borne diseases. World Heal Organ. 2014;9
    1. King J.G. Developmental and comparative perspectives on mosquito immunity. Dev Comp Immunol. 2020;103 doi: 10.1016/j.dci.2019.103458. - DOI - PubMed
    1. LaDeau S.L., Allan B.F., Leisnham P.T., Levy M.Z. The ecological foundations of transmission potential and vector-borne disease in urban landscapes. Funct Ecol. 2015;29:889–901. doi: 10.1111/1365-2435.12487. - DOI - PMC - PubMed
    1. Magori K., Drake J.M. The population dynamics of vector-borne diseases. Nat Educ Knowl. 2013;4(4):14.

LinkOut - more resources