Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

doi:10.2196/47014

. 2023 Oct 16:25:e47014.

doi: 10.2196/47014.

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

Affiliations

¹ Département de médecine sociale et préventive, École de Santé Publique de l'Université de Montréal, Université de Montréal, Montréal, QC, Canada.
² Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt.
³ Harvard Extension School, Harvard University, Cambridge, MA, United States.

PMID: 37843893
PMCID: PMC10616745
DOI: 10.2196/47014

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

Elda Kokoe Elolo Laison et al. J Med Internet Res. 2023.

. 2023 Oct 16:25:e47014.

doi: 10.2196/47014.

Affiliations

¹ Département de médecine sociale et préventive, École de Santé Publique de l'Université de Montréal, Université de Montréal, Montréal, QC, Canada.
² Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt.
³ Harvard Extension School, Harvard University, Cambridge, MA, United States.

PMID: 37843893
PMCID: PMC10616745
DOI: 10.2196/47014

Abstract

Background: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretation for clinical diagnosis validation. A lack of these can lead to delayed diagnosis and treatment, which can exacerbate the severity of Lyme disease symptoms. Therefore, there is a need to improve the monitoring of Lyme disease by using other data sources, such as web-based data.

Objective: We analyzed global Twitter data to understand its potential and limitations as a tool for Lyme disease surveillance. We propose a transformer-based classification system to identify potential Lyme disease cases using self-reported tweets.

Methods: Our initial sample included 20,000 tweets collected worldwide from a database of over 1.3 million Lyme disease tweets. After preprocessing and geolocating tweets, tweets in a subset of the initial sample were manually labeled as potential Lyme disease cases or non-Lyme disease cases using carefully selected keywords. Emojis were converted to sentiment words, which were then replaced in the tweets. This labeled tweet set was used for the training, validation, and performance testing of DistilBERT (distilled version of BERT [Bidirectional Encoder Representations from Transformers]), ALBERT (A Lite BERT), and BERTweet (BERT for English Tweets) classifiers.

Results: The empirical results showed that BERTweet was the best classifier among all evaluated models (average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%). However, for recall, term frequency-inverse document frequency and k-nearest neighbors performed better (93.2% and 82.6%, respectively). On using emojis to enrich the tweet embeddings, BERTweet had an increased recall (8% increase), DistilBERT had an increased F1-score of 93.8% (4% increase) and classification accuracy of 94.1% (4% increase), and ALBERT had an increased F1-score of 93.1% (5% increase) and classification accuracy of 93.9% (5% increase). The general awareness of Lyme disease was high in the United States, the United Kingdom, Australia, and Canada, with self-reported potential cases of Lyme disease from these countries accounting for around 50% (9939/20,000) of the collected English-language tweets, whereas Lyme disease-related tweets were rare in countries from Africa and Asia. The most reported Lyme disease-related symptoms in the data were rash, fatigue, fever, and arthritis, while symptoms, such as lymphadenopathy, palpitations, swollen lymph nodes, neck stiffness, and arrythmia, were uncommon, in accordance with Lyme disease symptom frequency.

Conclusions: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results demonstrated that emojis are effective for enrichment, thereby improving the accuracy of tweet embeddings and the performance of classifiers. Specifically, emojis reflecting sadness, empathy, and encouragement can reduce false negatives.

Keywords: BERT; Bidirectional Encoder Representations from Transformers; Lyme disease; Twitter; emojis; machine learning; natural language processing.

©Elda Kokoe Elolo Laison, Mohamed Hamza Ibrahim, Srikanth Boligarla, Jiaxin Li, Raja Mahadevan, Austen Ng, Venkataraman Muthuramalingam, Wee Yi Lee, Yijun Yin, Bouchra R Nasri. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.10.2023.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
The 2-stage approach proposed for predicting potential Lyme disease cases. The first stage involves 4 elements: (1) We used standard search terms to collect tweets via the Twitter application programming interface; (2) We cleaned the tweets by removing hashtags, URL links, HTML markups, and stop-words; (3) We manually labeled the tweets as Lyme or non-Lyme using a list of precise keywords; and (4) We converted emojis into sentiment words, which were then substituted for the emojis in the tweets. In the second stage, we used a transformer-based classifier to determine whether a tweet is a potential Lyme disease case or not. When a new tweet was assigned with the highest probability to the Lyme disease class, we used the GeoPy library to estimate the tweet’s location. The 3 special tokens were as follows: [CLS], which stood for classification and was typically the first token of every sequence; [SEP], which described to the pretrained language model which token belongs to which sequence; and [PAD], which was used to fill the unused token slots to ensure that the maximum token length was met.

See this image and copyright information in PMC

Cited by

Identifying the geographic leading edge of Lyme disease in the United States with internet searches: A spatiotemporal analysis of Google Health Trends data.
Wychgram C, Aucott JN, Rebman AW, Curriero FC. Wychgram C, et al. PLoS One. 2024 Nov 13;19(11):e0312277. doi: 10.1371/journal.pone.0312277. eCollection 2024. PLoS One. 2024. PMID: 39535983 Free PMC article.
Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models.
Cid VH, Mork J. Cid VH, et al. ArXiv [Preprint]. 2025 Jun 3:arXiv:2506.03321v1. ArXiv. 2025. PMID: 40735093 Free PMC article. Preprint.

References

1. Rodino KG, Theel ES, Pritt BS. Tick-Borne Diseases in the United States. Clin Chem. 2020 Apr 01;66(4):537–548. doi: 10.1093/clinchem/hvaa040.5814095 - DOI - PubMed
1. Boulanger N, Boyer P, Talagrand-Reboul E, Hansmann Y. Ticks and tick-borne diseases. Med Mal Infect. 2019 Mar;49(2):87–97. doi: 10.1016/j.medmal.2019.01.007.S0399-077X(18)30719-4 - DOI - PubMed
1. Cutler SJ, Vayssier-Taussat M, Estrada-Peña A, Potkonjak A, Mihalca AD, Zeller H. Tick-borne diseases and co-infection: Current considerations. Ticks Tick Borne Dis. 2021 Jan;12(1):101607. doi: 10.1016/j.ttbdis.2020.101607.S1877-959X(20)30477-5 - DOI - PubMed
1. Belongia EA. Epidemiology and impact of coinfections acquired from Ixodes ticks. Vector Borne Zoonotic Dis. 2002;2(4):265–73. doi: 10.1089/153036602321653851. - DOI - PubMed
1. Wachter J, Martens C, Barbian K, Rego ROM, Rosa P. Epigenomic Landscape of Lyme Disease Spirochetes Reveals Novel Motifs. mBio. 2021 Jun 29;12(3):e0128821. doi: 10.1128/mBio.01288-21. https://journals.asm.org/doi/10.1128/mBio.01288-21?url_ver=Z39.88-2003&r... - DOI - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

[1] Rodino KG, Theel ES, Pritt BS. Tick-Borne Diseases in the United States. Clin Chem. 2020 Apr 01;66(4):537–548. doi: 10.1093/clinchem/hvaa040.5814095 - DOI - PubMed

[2] Rodino KG, Theel ES, Pritt BS. Tick-Borne Diseases in the United States. Clin Chem. 2020 Apr 01;66(4):537–548. doi: 10.1093/clinchem/hvaa040.5814095 - DOI - PubMed

[3] Boulanger N, Boyer P, Talagrand-Reboul E, Hansmann Y. Ticks and tick-borne diseases. Med Mal Infect. 2019 Mar;49(2):87–97. doi: 10.1016/j.medmal.2019.01.007.S0399-077X(18)30719-4 - DOI - PubMed

[4] Boulanger N, Boyer P, Talagrand-Reboul E, Hansmann Y. Ticks and tick-borne diseases. Med Mal Infect. 2019 Mar;49(2):87–97. doi: 10.1016/j.medmal.2019.01.007.S0399-077X(18)30719-4 - DOI - PubMed

[5] Cutler SJ, Vayssier-Taussat M, Estrada-Peña A, Potkonjak A, Mihalca AD, Zeller H. Tick-borne diseases and co-infection: Current considerations. Ticks Tick Borne Dis. 2021 Jan;12(1):101607. doi: 10.1016/j.ttbdis.2020.101607.S1877-959X(20)30477-5 - DOI - PubMed

[6] Cutler SJ, Vayssier-Taussat M, Estrada-Peña A, Potkonjak A, Mihalca AD, Zeller H. Tick-borne diseases and co-infection: Current considerations. Ticks Tick Borne Dis. 2021 Jan;12(1):101607. doi: 10.1016/j.ttbdis.2020.101607.S1877-959X(20)30477-5 - DOI - PubMed

[7] Belongia EA. Epidemiology and impact of coinfections acquired from Ixodes ticks. Vector Borne Zoonotic Dis. 2002;2(4):265–73. doi: 10.1089/153036602321653851. - DOI - PubMed

[8] Belongia EA. Epidemiology and impact of coinfections acquired from Ixodes ticks. Vector Borne Zoonotic Dis. 2002;2(4):265–73. doi: 10.1089/153036602321653851. - DOI - PubMed

[9] Wachter J, Martens C, Barbian K, Rego ROM, Rosa P. Epigenomic Landscape of Lyme Disease Spirochetes Reveals Novel Motifs. mBio. 2021 Jun 29;12(3):e0128821. doi: 10.1128/mBio.01288-21. https://journals.asm.org/doi/10.1128/mBio.01288-21?url_ver=Z39.88-2003&r... - DOI - DOI - PMC - PubMed

[10] Wachter J, Martens C, Barbian K, Rego ROM, Rosa P. Epigenomic Landscape of Lyme Disease Spirochetes Reveals Novel Motifs. mBio. 2021 Jun 29;12(3):e0128821. doi: 10.1128/mBio.01288-21. https://journals.asm.org/doi/10.1128/mBio.01288-21?url_ver=Z39.88-2003&r... - DOI - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

Affiliations

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical