Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach
- PMID: 37725432
- PMCID: PMC10510753
- DOI: 10.2196/45767
Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach
Erratum in
-
Figure Correction: Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach.J Med Internet Res. 2023 Dec 8;25:e55010. doi: 10.2196/55010. J Med Internet Res. 2023. PMID: 38064711 Free PMC article.
Abstract
Background: While scientific knowledge of post-COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians.
Objective: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline's potential as a surveillance tool.
Methods: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries.
Results: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada.
Conclusions: The outcome of our social media-derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient's journey that can help health care providers anticipate future needs.
International registered report identifier (irrid): RR2-10.1101/2022.12.14.22283419.
Keywords: PCC; PRO; Reddit; Twitter; bidirectional encoder representations from transformers; entity extraction; entity normalization; health outcome; long COVID; machine learning; natural language processing; patient-reported outcome; patient-reported symptom; post–COVID-19 condition; social media; symptom; transformer models.
©Elham Dolatabadi, Diana Moyano, Michael Bales, Sofija Spasojevic, Rohan Bhambhoria, Junaid Bhatti, Shyamolima Debnath, Nicholas Hoell, Xin Li, Celine Leng, Sasha Nanda, Jad Saab, Esmat Sahak, Fanny Sie, Sara Uppal, Nirma Khatri Vadlamudi, Antoaneta Vladimirova, Artur Yakimovich, Xiaoxue Yang, Sedef Akinli Kocak, Angela M Cheung. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 19.09.2023.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures





Similar articles
-
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229. J Med Internet Res. 2021. PMID: 34383671 Free PMC article.
-
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705. J Med Internet Res. 2022. PMID: 35976193 Free PMC article.
-
Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.JMIR Med Inform. 2021 Sep 17;9(9):e27670. doi: 10.2196/27670. JMIR Med Inform. 2021. PMID: 34346892 Free PMC article.
-
Behavioural modification interventions for medically unexplained symptoms in primary care: systematic reviews and economic evaluation.Health Technol Assess. 2020 Sep;24(46):1-490. doi: 10.3310/hta24460. Health Technol Assess. 2020. PMID: 32975190 Free PMC article.
-
Social Media as a Research Tool (SMaaRT) for Risky Behavior Analytics: Methodological Review.JMIR Public Health Surveill. 2020 Nov 30;6(4):e21660. doi: 10.2196/21660. JMIR Public Health Surveill. 2020. PMID: 33252345 Free PMC article. Review.
Cited by
-
Toward the novel AI tasks in infection biology.mSphere. 2024 Feb 28;9(2):e0059123. doi: 10.1128/msphere.00591-23. Epub 2024 Feb 9. mSphere. 2024. PMID: 38334404 Free PMC article. Review.
-
Academic case reports lack diversity: Assessing the presence and diversity of sociodemographic and behavioral factors related to Post COVID-19 Condition.PLoS One. 2025 Jul 2;20(7):e0326668. doi: 10.1371/journal.pone.0326668. eCollection 2025. PLoS One. 2025. PMID: 40601702 Free PMC article.
-
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8. Yearb Med Inform. 2024. PMID: 40199311 Free PMC article.
-
The FAIIR conversational AI agent assistant for youth mental health service provision.NPJ Digit Med. 2025 May 3;8(1):243. doi: 10.1038/s41746-025-01647-6. NPJ Digit Med. 2025. PMID: 40319168 Free PMC article.
-
Portrait of mental health identified by people with the post-covid syndrome.Qual Life Res. 2024 Sep;33(9):2509-2516. doi: 10.1007/s11136-024-03719-8. Epub 2024 Jun 25. Qual Life Res. 2024. PMID: 38916660
References
-
- Deer RR, Rock MA, Vasilevsky N, Carmody L, Rando H, Anzalone AJ, Basson MD, Bennett TD, Bergquist T, Boudreau EA, Bramante CT, Byrd JB, Callahan TJ, Chan LE, Chu H, Chute CG, Coleman BD, Davis HE, Gagnier J, Greene CS, Hillegass WB, Kavuluru R, Kimble WD, Koraishy FM, Köhler S, Liang C, Liu F, Liu H, Madhira V, Madlock-Brown CR, Matentzoglu N, Mazzotti DR, McMurry JA, McNair DS, Moffitt RA, Monteith TS, Parker AM, Perry MA, Pfaff E, Reese JT, Saltz J, Schuff RA, Solomonides AE, Solway J, Spratt H, Stein GS, Sule AA, Topaloglu U, Vavougios GD, Wang L, Haendel MA, Robinson PN. Characterizing long COVID: deep phenotype of a complex condition. eBioMedicine. 2021;74:103722. doi: 10.1016/j.ebiom.2021.103722. https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(21)00516-... S2352-3964(21)00516-8 - DOI - PMC - PubMed
-
- Domingo FR, Waddell LA, Cheung AM, Cooper CL, Belcourt VJ, Zuckermann AM, Corrin T, Ahmad R, Boland L, Laprise C, Idzerda L. Prevalence of long-term effects in individuals diagnosed with COVID-19: an updated living systematic review. bioRxiv, medRxiv. 2021:1–59. doi: 10.1101/2021.06.03.21258317. https://www.medrxiv.org/content/10.1101/2021.06.03.21258317v2 - DOI - DOI
-
- Davis HE, Assaf GS, McCorkell L, Wei H, Low RJ, Re'em Y, Redfield S, Austin JP, Akrami A. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. eClinicalMedicine. 2021;38:101019. doi: 10.1016/j.eclinm.2021.101019. https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(21)00299... S2589-5370(21)00299-6 - DOI - PMC - PubMed
-
- Mahase E. Covid-19: what do we know about "long covid"? BMJ. 2020;370:m2815. doi: 10.1136/bmj.m2815. https://www.bmj.com/content/370/bmj.m2815 - DOI - PubMed
-
- Chakraborty A, Johnson JN, Spagnoli J, Amin N, Mccoy M, Swaminathan N, Yohannan T, Philip R. Long-term cardiovascular outcomes of multisystem inflammatory syndrome in children associated with COVID-19 using an institution based algorithm. Pediatr Cardiol. 2023;44(2):367–380. doi: 10.1007/s00246-022-03020-w. https://link.springer.com/article/10.1007/s00246-022-03020-w 10.1007/s00246-022-03020-w - DOI - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous