A systematic literature review of machine learning in online personal health data

Zhijun Yin¹, Lina M Sulieman¹, Bradley A Malin^{1

2

3}

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
² Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
³ Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA.

PMID: 30908576
PMCID: PMC7647332
DOI: 10.1093/jamia/ocz009

A systematic literature review of machine learning in online personal health data

Zhijun Yin et al. J Am Med Inform Assoc. 2019.

. 2019 Jun 1;26(6):561-576.

doi: 10.1093/jamia/ocz009.

Authors

Zhijun Yin¹, Lina M Sulieman¹, Bradley A Malin^{1

2

3}

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
² Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
³ Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA.

PMID: 30908576
PMCID: PMC7647332
DOI: 10.1093/jamia/ocz009

Abstract

Objective: User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations.

Materials and methods: We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review.

Results: We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support.

Conclusions: The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

Keywords: machine learning; online environment; online health community; patient portal; personal health; social media; systematic review.

PubMed Disclaimer

Figures

**Figure 1.**
Illustration of the steps used in the literature search.

See this image and copyright information in PMC

References

1. Collen M, Ball M.. The History of Medical Informatics in the United States. New York: Springer; 2015.
1. King J, Patel V, Jamoom EW, et al. Clinical benefits of electronic health record use: national findings. Health Serv Res 2014; 49 (1pt2): 392–404. - PMC - PubMed
1. Bowton E, Field JR, Wang S, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med 2014; 6: 234cm3 doi:10.1126/scitranslmed.3008604. http://stm.sciencemag.org/content/6/234/234cm3/tab-pdf - DOI - PMC - PubMed
1. Yin Z, Malin B, Warner J, et al. The power of the patient voice: learning indicators of treatment adherence from an online breast cancer forum. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), 2017: 337–46.
1. Gkotsis G, Oellrich A, Velupillai S, et al. Characterisation of mental health conditions in social media using informed deep learning. Sci Rep 2017; 7: 45141. doi:10.1038/srep45141. https://www.nature.com/articles/srep45141 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A systematic literature review of machine learning in online personal health data

Affiliations

A systematic literature review of machine learning in online personal health data

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources