Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 1;31(9):1856-1864.
doi: 10.1093/jamia/ocae030.

Clinical risk prediction using language models: benefits and considerations

Affiliations

Clinical risk prediction using language models: benefits and considerations

Angeela Acharya et al. J Am Med Inform Assoc. .

Abstract

Objective: The use of electronic health records (EHRs) for clinical risk prediction is on the rise. However, in many practical settings, the limited availability of task-specific EHR data can restrict the application of standard machine learning pipelines. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks.

Methods: We propose two novel LM-based methods, namely "LLaMA2-EHR" and "Sent-e-Med." Our focus is on utilizing the textual descriptions within structured EHRs to make risk predictions about future diagnoses. We conduct a comprehensive comparison with previous approaches across various data types and sizes.

Results: Experiments across 6 different methods and 3 separate risk prediction tasks reveal that employing LMs to represent structured EHRs, such as diagnostic histories, results in significant performance improvements when evaluated using standard metrics such as area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Additionally, they offer benefits such as few-shot learning, the ability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. However, it is noteworthy that outcomes may exhibit sensitivity to a specific prompt.

Conclusion: LMs encompass extensive embedded knowledge, making them valuable for the analysis of EHRs in the context of risk prediction. Nevertheless, it is important to exercise caution in their application, as ongoing safety concerns related to LMs persist and require continuous consideration.

Keywords: electronic health records; large language models; opioid use disorder; risk prediction; substance use disorder.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Figure 1.
Figure 1.
Representation of medical records for a single patient in a typical EHR: A visit may have a varying number of medical entities (ie, diagnosis, procedure, medications, etc.).
Figure 2.
Figure 2.
Process of creating patient groups. Patients who had at least one OUD/SUD/Diabetes diagnosis are put into Case Group while those who did not have any OUD/SUD/Diabetes diagnosis are put into Control Group.
Figure 3.
Figure 3.
High-level overview of the Sent-e-Med architecture: for each medical code, sentence embeddings and visit embeddings are extracted and subsequently combined before being fed into the transformer encoder as input.
Figure 4.
Figure 4.
Illustration of 2 distinct prompts employed in the fine-tuning of the LLaMA2-EHR model. Prompt 1 aggregates the frequency of diagnosis occurrences across multiple visits, while Prompt 2 evaluates diagnoses on a per-visit basis and incorporates information about the intervals between visits. Red highlights in the text are employed to indicate patient-specific variations in the information.Note: Inputs in the prompts and the responses are just hypothetical examples.
Figure 5.
Figure 5.
Examining the variations in LLaMA2-EHR responses when predicting the probability of Diabetes diagnosis based on 2 distinct inputs: one representing a simple hypothetical patient’s medical history (Input 1) and another involving additional diagnosis information that is known to be the risk factors of Diabetes (Input 2). The objective is to analyze how the likelihood of a “Yes” or “No” prediction changes within these specific scenarios.

References

    1. Pendergrass SA, Crawford DC. Using electronic health records to generate phenotypes for research. Curr Protoc Hum Genet. 2018;100(1):e80. - PMC - PubMed
    1. Goldstein BA, Navar AM, Pencina MJ, et al. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24(1):198––208.. - PMC - PubMed
    1. Choi E, Bahadori MT, Song L, et al. Gram: Graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17. Association for Computing Machinery. 2017:787–795. - PMC - PubMed
    1. Shang J, Ma T, Xiao C, et al. Pre-training of graph augmented transformers for medication recommendation. CoRR, abs/1906.00346. 2019.
    1. Hirsch JA, Nicola G, McGinty G, et al. ICD-10: History and Context. AJNR Am J Neuroradiol. 2016;37(4):596–599. 10.3174/ajnr.A4696 - DOI - PMC - PubMed