Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 1;26(6):561-576.
doi: 10.1093/jamia/ocz009.

A systematic literature review of machine learning in online personal health data

Affiliations

A systematic literature review of machine learning in online personal health data

Zhijun Yin et al. J Am Med Inform Assoc. .

Abstract

Objective: User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations.

Materials and methods: We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review.

Results: We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support.

Conclusions: The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

Keywords: machine learning; online environment; online health community; patient portal; personal health; social media; systematic review.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of the steps used in the literature search.

References

    1. Collen M, Ball M.. The History of Medical Informatics in the United States. New York: Springer; 2015.
    1. King J, Patel V, Jamoom EW, et al. Clinical benefits of electronic health record use: national findings. Health Serv Res 2014; 49 (1pt2): 392–404. - PMC - PubMed
    1. Bowton E, Field JR, Wang S, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med 2014; 6: 234cm3 doi:10.1126/scitranslmed.3008604. http://stm.sciencemag.org/content/6/234/234cm3/tab-pdf - DOI - PMC - PubMed
    1. Yin Z, Malin B, Warner J, et al. The power of the patient voice: learning indicators of treatment adherence from an online breast cancer forum. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), 2017: 337–46.
    1. Gkotsis G, Oellrich A, Velupillai S, et al. Characterisation of mental health conditions in social media using informed deep learning. Sci Rep 2017; 7: 45141. doi:10.1038/srep45141. https://www.nature.com/articles/srep45141 - DOI - PMC - PubMed

Publication types