Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 1:12:e17045.
doi: 10.7717/peerj.17045. eCollection 2024.

Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis

Affiliations

Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis

Song-Quan Ong et al. PeerJ. .

Abstract

Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.

Keywords: Amplification intelligence; Determinants; Infection diseases; Mosquitoes; Text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Text analytics pipeline and the role of machine and human, respectively.
Figure 2
Figure 2. Figure 2 shows the overall workflow to construct the annotated dataset.
Figure 3
Figure 3. Distribution of tweets by keyword and the corresponding word cloud.
The size of the word indicates the density of tweets mentioning the word, e.g., the 2,435 tweets queried with “dengue” mainly mentioned “mosquito”, “fever”, “health” and ”malaria”, as indicated by the larger front size of the word in the word cloud.
Figure 4
Figure 4. Distribution of sentiment polarity of tweets in this study.
Figure 5
Figure 5. Venn diagram of the topics selected by feature selection algorithm.
Figure 6
Figure 6. Performance of multinomial logistics regression by using different selected topic group.
Mean ± standard error as the confidence interval.
Figure 7
Figure 7. The distribution of polarity for the individual topic.
Three topics with a majority of negative polarity indicated by a larger grey area: “Symptom/outcome of an outbreak”, “Diseases with similar symptoms”. Two topics with a majority positive polarity, indicated by a larger blue area in the pie chart, were “Efforts/funding” and “Biomarkers for diagnosis”. Two topics with neutral polarity, indicated by a larger orange area in the pie chart, were “Mosquito control” and “Season”.

References

    1. Aenishaenslin C, Hongoh V, Cissé HD, Hoen AG, Samoura K, Michel P, Bélanger D. Multi-criteria decision analysis as an innovative approach to managing zoonoses: results from a study on Lyme disease in Canada. BMC Public Health. 2013;13:1–16. doi: 10.1186/1471-2458-13-1. - DOI - PMC - PubMed
    1. Bazoukis G, Hall J, Loscalzo J, Antman EM, Fuster V, Armoundas AA. The inclusion of augmented intelligence in medicine: a framework for successful implementation. Cell Reports Medicine. 2022;3(1):100485. doi: 10.1016/j.xcrm.2021.100485. - DOI - PMC - PubMed
    1. Berendt B, Sammut C, Webb GI. Encyclopedia of machine learning. Springer; Boston: 2016. Text mining for news and blogs analysis.
    1. Bhandari M, Reddiboina M. Augmented intelligence: a synergy between man and the machine. Indian Journal of Urology. 2019;35:89. - PMC - PubMed
    1. Crigger E, Khoury C. Making policy on augmented intelligence in health care. AMA Journal of Ethics. 2019;21:188–191. doi: 10.1001/amajethics.2019.188. - DOI - PubMed

LinkOut - more resources