. 2022 Jun 21;24(6):e32867.

doi: 10.2196/32867.

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study

Steven S Doerstling^{1

2}, Dennis Akrobetu^{1

2}, Matthew M Engelhard³, Felicia Chen⁴, Peter A Ubel^{5

6}

Affiliations

¹ Duke University School of Medicine, Duke University, Durham, NC, United States.
² Margolis Center for Health Policy, Duke University, Durham, NC, United States.
³ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States.
⁴ Apple, Inc, Cupertino, CA, United States.
⁵ Fuqua School of Business, Duke University, Durham, NC, United States.
⁶ Sanford School of Public Policy, Duke University, Durham, NC, United States.

PMID: 35727610
PMCID: PMC9257615
DOI: 10.2196/32867

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study

Steven S Doerstling et al. J Med Internet Res. 2022.

. 2022 Jun 21;24(6):e32867.

doi: 10.2196/32867.

Authors

Steven S Doerstling^{1

2}, Dennis Akrobetu^{1

2}, Matthew M Engelhard³, Felicia Chen⁴, Peter A Ubel^{5

6}

Affiliations

¹ Duke University School of Medicine, Duke University, Durham, NC, United States.
² Margolis Center for Health Policy, Duke University, Durham, NC, United States.
³ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States.
⁴ Apple, Inc, Cupertino, CA, United States.
⁵ Fuqua School of Business, Duke University, Durham, NC, United States.
⁶ Sanford School of Public Policy, Duke University, Durham, NC, United States.

PMID: 35727610
PMCID: PMC9257615
DOI: 10.2196/32867

Abstract

Background: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions about specific medical conditions. Previous studies have used methods that either failed to address major challenges or were poorly scalable to large sample sizes. To enable further research on this emerging funding mechanism in health care, better methods are needed.

Objective: We sought to validate an algorithm for identifying 11 disease categories in web-based medical crowdfunding campaigns. We hypothesized that a disease identification algorithm combining a named entity recognition (NER) model and word search approach could identify disease categories with high precision and accuracy. Such an algorithm would facilitate further research using these data.

Methods: Web scraping was used to collect data on medical crowdfunding campaigns from GoFundMe (GoFundMe Inc). Using pretrained NER and entity resolution models from Spark NLP for Healthcare in combination with targeted keyword searches, we constructed an algorithm to identify conditions in the campaign descriptions, translate conditions to International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes, and predict the presence or absence of 11 disease categories in the campaigns. The classification performance of the algorithm was evaluated against 400 manually labeled campaigns.

Results: We collected data on 89,645 crowdfunding campaigns through web scraping. The interrater reliability for detecting the presence of broad disease categories in the campaign descriptions was high (Cohen κ: range 0.69-0.96). The NER and entity resolution models identified 6594 unique (276,020 total) ICD-10-CM codes among all of the crowdfunding campaigns in our sample. Through our word search, we identified 3261 additional campaigns for which a medical condition was not otherwise detected with the NER model. When averaged across all disease categories and weighted by the number of campaigns that mentioned each disease category, the algorithm demonstrated an overall precision of 0.83 (range 0.48-0.97), a recall of 0.77 (range 0.42-0.98), an F₁ score of 0.78 (range 0.56-0.96), and an accuracy of 95% (range 90%-98%).

Conclusions: A disease identification algorithm combining pretrained natural language processing models and ICD-10-CM code-based disease categorization was able to detect 11 disease categories in medical crowdfunding campaigns with high precision and accuracy.

Keywords: GoFundMe; crowdfunding; health care costs; named entity recognition; natural language processing.

©Steven S Doerstling, Dennis Akrobetu, Matthew M Engelhard, Felicia Chen, Peter A Ubel. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 21.06.2022.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 2**
The relative contributions of the NER model and word search to detecting disease categories. All campaigns for which the disease categories on the y-axis were detected by the disease identification algorithm are presented. The colored bars represent the percentage of those campaigns for which the disease categories were detected by the NER model only (blue), the NER model and word search (orange), or the word search only (green). NER: named entity recognition.

**Figure 3**
The co-occurrence of disease categories identified by the NER model and word search. The heat map values represent the percentage of campaigns containing the disease category in each row (identified by the NER model) that also contain the disease category in each column (identified via word search). NER: named entity recognition.

See this image and copyright information in PMC

Cited by

Impact of Medical Conditions and Area Deprivation on Fundraising Success in Online Crowdfunding: Cross-Sectional Study.
Doerstling SS, Engelhard MM, Akrobetu D, Sloan CE, Campagna A, Nguyen TV, Madanay F, Chen F, Ubel PA. Doerstling SS, et al. J Med Internet Res. 2025 Jul 29;27:e72475. doi: 10.2196/72475. J Med Internet Res. 2025. PMID: 40729621 Free PMC article.
The role of race and ethnicity in health care crowdfunding: an exploratory analysis.
Machado S, Perez B, Papanicolas I. Machado S, et al. Health Aff Sch. 2024 Feb 28;2(3):qxae027. doi: 10.1093/haschl/qxae027. eCollection 2024 Mar. Health Aff Sch. 2024. PMID: 38756917 Free PMC article.
Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.
Al-Garadi M, LeNoue-Newton M, Matheny ME, McPheeters M, Whitaker JM, Deere JA, McLemore MF, Westerman D, Khan MS, Hernández-Muñoz JJ, Wang X, Kuzucan A, Desai RJ, Reeves R. Al-Garadi M, et al. J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113. J Med Internet Res. 2025. PMID: 40824124 Free PMC article.

References

1. Ranard BL, Werner RM, Antanavicius T, Schwartz HA, Smith RJ, Meisel ZF, Asch DA, Ungar LH, Merchant RM. Yelp reviews of hospital care can supplement and inform traditional surveys of the patient experience of care. Health Aff (Millwood) 2016 Apr;35(4):697–705. doi: 10.1377/hlthaff.2015.1030. http://europepmc.org/abstract/MED/27044971 35/4/697 - DOI - PMC - PubMed
1. MacKinlay A, Aamer H, Yepes AJ. Detection of adverse drug reactions using medical named entities on Twitter. AMIA Annu Symp Proc. 2018 Apr 16;2017:1215–1224. http://europepmc.org/abstract/MED/29854190 - PMC - PubMed
1. Cocos A, Fiks AG, Masino AJ. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inform Assoc. 2017 Jul 01;24(4):813–821. doi: 10.1093/jamia/ocw180. http://europepmc.org/abstract/MED/28339747 3041102 - DOI - PMC - PubMed
1. Cohen AJ, Brody H, Patino G, Ndoye M, Liaw A, Butler C, Breyer BN. Use of an online crowdfunding platform for unmet financial obligations in cancer care. JAMA Intern Med. 2019 Dec 01;179(12):1717–1720. doi: 10.1001/jamainternmed.2019.3330. http://europepmc.org/abstract/MED/31498408 2749759 - DOI - PMC - PubMed
1. Loeb S, Taneja S, Walter D, Zweifach S, Byrne N. Crowdfunding for prostate cancer and breast cancer. BJU Int. 2018 Nov;122(5):723–725. doi: 10.1111/bju.14408. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study

Affiliations

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources