. 2024 Dec 23:8:e63866.

doi: 10.2196/63866.

Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation

Gina Sprint^#¹, Maureen Schmitter-Edgecombe², Diane Cook^#²

Affiliations

¹ Department of Computer Science, Gonzaga University, Spokane, WA, United States.
² School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.

^# Contributed equally.

PMID: 39715540
PMCID: PMC11704625
DOI: 10.2196/63866

Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation

Gina Sprint et al. JMIR Form Res. 2024.

. 2024 Dec 23:8:e63866.

doi: 10.2196/63866.

Authors

Gina Sprint^#¹, Maureen Schmitter-Edgecombe², Diane Cook^#²

Affiliations

¹ Department of Computer Science, Gonzaga University, Spokane, WA, United States.
² School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.

^# Contributed equally.

PMID: 39715540
PMCID: PMC11704625
DOI: 10.2196/63866

Abstract

Background: Human digital twins have the potential to change the practice of personalizing cognitive health diagnosis because these systems can integrate multiple sources of health information and influence into a unified model. Cognitive health is multifaceted, yet researchers and clinical professionals struggle to align diverse sources of information into a single model.

Objective: This study aims to introduce a method called HDTwin, for unifying heterogeneous data using large language models. HDTwin is designed to predict cognitive diagnoses and offer explanations for its inferences.

Methods: HDTwin integrates cognitive health data from multiple sources, including demographic, behavioral, ecological momentary assessment, n-back test, speech, and baseline experimenter testing session markers. Data are converted into text prompts for a large language model. The system then combines these inputs with relevant external knowledge from scientific literature to construct a predictive model. The model's performance is validated using data from 3 studies involving 124 participants, comparing its diagnostic accuracy with baseline machine learning classifiers.

Results: HDTwin achieves a peak accuracy of 0.81 based on the automated selection of markers, significantly outperforming baseline classifiers. On average, HDTwin yielded accuracy=0.77, precision=0.88, recall=0.63, and Matthews correlation coefficient=0.57. In comparison, the baseline classifiers yielded average accuracy=0.65, precision=0.86, recall=0.35, and Matthews correlation coefficient=0.36. The experiments also reveal that HDTwin yields superior predictive accuracy when information sources are fused compared to single sources. HDTwin's chatbot interface provides interactive dialogues, aiding in diagnosis interpretation and allowing further exploration of patient data.

Conclusions: HDTwin integrates diverse cognitive health data, enhancing the accuracy and explainability of cognitive diagnoses. This approach outperforms traditional models and provides an interface for navigating patient information. The approach shows promise for improving early detection and intervention strategies in cognitive health.

Keywords: artificial intelligence; chatbot; cognitive diagnosis; cognitive health; digital behavior marker; digital twin; health information; human digital twin; interview marker; large language models; machine learning; smartwatch.

©Gina Sprint, Maureen Schmitter-Edgecombe, Diane Cook. Originally published in JMIR Formative Research (https://formative.jmir.org), 23.12.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
HDTwin information processing pipeline. A user interacts with the LLM interface to request summary information about a person or a suggested diagnosis. Based on the query, HDTwin retrieves personalized markers together with paper abstracts and data from a knowledge base that informs a response. The query response is presented to the user, supporting an ongoing conversation about the person or explanation of the query response. LLM: large language model.

**Figure 2**
In addition to collecting sensor data, the smartwatch app queries the user for their current state, includes an n-back shape test, and collects daily audio data.

**Figure 3**
Distribution of healthy participants and those with MCI based on HDTwin markers that include (from upper left): demographics, behavior, EMA response, and n-back scores. The bottom graph shows a t-sne plot of all quantifiable features. Text input from journals and testing sessions are not included in the plots. EMA: ecological momentary assessment; MCI: mild cognitive impairment.

**Figure 4**
The HDTwin chatbot interface with an example prompt and response for a query regarding one of a person’s n-back score statistics. Users can see the agent’s message memory using the “Chat History” dropdown and the agent’s planning and execution steps using the “See Intermediate Steps” dropdown. A video demonstration of the chatbot is available on the web [28].

See this image and copyright information in PMC

Cited by

Digital Twins for Personalized Medicine Require Epidemiological Data and Mathematical Modeling: Viewpoint.
Vallée A. Vallée A. J Med Internet Res. 2025 Aug 5;27:e72411. doi: 10.2196/72411. J Med Internet Res. 2025. PMID: 40762974 Free PMC article.

References

1. O'Malley RPD, Mirheidari B, Harkness K, Reuber M, Venneri A, Walker T, Christensen H, Blackburn D. Fully automated cognitive screening tool based on assessment of speech and language. J Neurol Neurosurg Psychiatry. 2020;92(1):12–15. doi: 10.1136/jnnp-2019-322517. https://eprints.whiterose.ac.uk/169297/ jnnp-2019-322517 - DOI - PubMed
1. Sand Aronsson FS, Kuhlmann M, Jelic V, Östberg P. Is cognitive impairment associated with reduced syntactic complexity in writing? Evidence from automated text analysis. Aphasiology. 2020;35(7):900–913. doi: 10.1080/02687038.2020.1742282. - DOI
1. Nicosia J, Aschenbrenner AJ, Balota DA, Sliwinski MJ, Tahan M, Adams S, Stout SS, Wilks H, Gordon BA, Benzinger TLS. Unsupervised high-frequency smartphone-based cognitive assessments are reliable, valid, and feasible in older adults at risk for Alzheimer's disease. J Int Neuropsychol Soc. 2023;29(5):459–471. doi: 10.31234/osf.io/wtsyn. - DOI - PMC - PubMed
1. Schmitter-Edgecombe M, Luna C, Beech B, Dai S, Cook D. Capturing cognitive capacity in the everyday environment across a continuum of cognitive decline using a smartwatch n-back task and ecological momentary assessment. Neuropsychology. 2024 doi: 10.1037/neu0000984.2025-46915-001 - DOI - PMC - PubMed
1. Cook D, Walker A, Minor B. A cross-study analysis of mobile EMA in monitoring behavior and well-being: insights to refine EMA methods. JMIR mHealth uHealth. 2024 doi: 10.2196/preprints.57018. https://www.researchgate.net/publication/378414238_A_Cross-Study_Analysi... - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation

Affiliations

Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources