. 2024 Jan 30:26:e50890.

doi: 10.2196/50890.

Machine Learning and Health Science Research: Tutorial

Hunyong Cho^#¹, Jane She^#¹, Daniel De Marchi¹, Helal El-Zaatari¹, Edward L Barnes^{2

3}, Anna R Kahkoska^{4

5

6}, Michael R Kosorok¹, Arti V Virkud⁷

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
² Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
³ Center for Gastrointestinal Biology and Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁴ Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁵ Division of Endocrinology and Metabolism, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁶ Center for Aging and Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁷ Kidney Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.

^# Contributed equally.

PMID: 38289657
PMCID: PMC10865203
DOI: 10.2196/50890

Machine Learning and Health Science Research: Tutorial

Hunyong Cho et al. J Med Internet Res. 2024.

. 2024 Jan 30:26:e50890.

doi: 10.2196/50890.

Authors

Hunyong Cho^#¹, Jane She^#¹, Daniel De Marchi¹, Helal El-Zaatari¹, Edward L Barnes^{2

3}, Anna R Kahkoska^{4

5

6}, Michael R Kosorok¹, Arti V Virkud⁷

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
² Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
³ Center for Gastrointestinal Biology and Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁴ Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁵ Division of Endocrinology and Metabolism, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁶ Center for Aging and Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
⁷ Kidney Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.

^# Contributed equally.

PMID: 38289657
PMCID: PMC10865203
DOI: 10.2196/50890

Abstract

Machine learning (ML) has seen impressive growth in health science research due to its capacity for handling complex data to perform a range of tasks, including unsupervised learning, supervised learning, and reinforcement learning. To aid health science researchers in understanding the strengths and limitations of ML and to facilitate its integration into their studies, we present here a guideline for integrating ML into an analysis through a structured framework, covering steps from framing a research question to study design and analysis techniques for specialized data types.

Keywords: health science researcher; machine learning; machine learning pipeline; medical machine learning; precision medicine; reproducibility; unsupervised learning.

©Hunyong Cho, Jane She, Daniel De Marchi, Helal El-Zaatari, Edward L Barnes, Anna R Kahkoska, Michael R Kosorok, Arti V Virkud. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.01.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Machine learning workflow for a health science research question, from research question refinement to results reporting, with additional considerations. The cyclic nature of the process is reflected in the arrows, as several different iterations may be considered before narrowing down to a decisive pipeline, leading to result reporting.

**Figure 2**
Commonly used algorithms in the supervised setting by algorithm type distinguished between classification and regression problems, as well as methods used in unsupervised learning.

See this image and copyright information in PMC

References

1. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–260. doi: 10.1126/science.aaa8415.349/6245/255 - DOI - PubMed
1. Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016;375(13):1216–1219. doi: 10.1056/NEJMp1606181. https://europepmc.org/abstract/MED/27682033 - DOI - PMC - PubMed
1. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–1318. doi: 10.1001/jama.2017.18391.2675024 - DOI - PubMed
1. Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, Jung K, Heller K, Kale D, Saeed M, Ossorio PN, Thadaney-Israni S, Goldenberg A. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–1340. doi: 10.1038/s41591-019-0548-6.10.1038/s41591-019-0548-6 - DOI - PubMed
1. Greely HT. The uneasy ethical and legal underpinnings of large-scale genomic biobanks. Annu Rev Genomics Hum Genet. 2007;8:343–364. doi: 10.1146/annurev.genom.7.080505.115721. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning and Health Science Research: Tutorial

Affiliations

Machine Learning and Health Science Research: Tutorial

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources