Observational Study

. 2025 Mar;31(3):829-839.

doi: 10.1038/s41591-024-03475-9. Epub 2025 Mar 4.

Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment

Paul Monroe Butler^{1

2

3}, Jenny Yang⁴, Roland Brown⁵, Matt Hobbs^{4

5}, Andrew Becker⁵, Joaquin Penalver-Andres⁵, Philippe Syz⁴, Sofia Muller⁴, Gautier Cosne⁵, Adrien Juraver⁵, Han Hee Song⁴, Paramita Saha-Chaudhuri⁵, Daniel Roggen⁵, Alf Scotland⁵, Natalia Silveira⁴, Gizem Demircioglu⁵, Audrey Gabelle⁵, Richard Hughes⁵, Michael G Erkkinen^{6

7}, Jessica B Langbaum^{7

8}, Jennifer H Lingler^{7

9}, Pamela Price^{7

10}, Yakeel T Quiroz^{7

11}, Sharon J Sha^{7

12}, Marty Sliwinski^{7

13}, Anton P Porsteinsson^{7

14}, Rhoda Au^{7

15}, Matt T Bianchi⁴, Hanson Lenyoun⁴, Hung Pham⁴, Mithun Patel⁴, Shibeshih Belachew⁵

Affiliations

¹ Apple Inc., Cupertino, CA, USA. pmbutler@bwh.harvard.edu.
² Biogen Inc., Cambridge, MA, USA. pmbutler@bwh.harvard.edu.
³ Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. pmbutler@bwh.harvard.edu.
⁴ Apple Inc., Cupertino, CA, USA.
⁵ Biogen Inc., Cambridge, MA, USA.
⁶ Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
⁷ Intuition Study Scientific Committee, Boston, MA, USA.
⁸ Banner Alzheimer's Institute, Phoenix, AZ, USA.
⁹ University of Pittsburgh School of Nursing, Pittsburgh, PA, USA.
¹⁰ The Balm in Gilead Inc., Richmond, VA, USA.
¹¹ Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
¹² Stanford School of Medicine, Palo Alto, CA, USA.
¹³ Penn State University, University Park, PA, USA.
¹⁴ University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
¹⁵ School of Medicine, Boston University Chobanian and Avedisian, Boston, MA, USA.

PMID: 40038507
PMCID: PMC11922773
DOI: 10.1038/s41591-024-03475-9

Observational Study

Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment

Paul Monroe Butler et al. Nat Med. 2025 Mar.

. 2025 Mar;31(3):829-839.

doi: 10.1038/s41591-024-03475-9. Epub 2025 Mar 4.

Authors

Affiliations

¹ Apple Inc., Cupertino, CA, USA. pmbutler@bwh.harvard.edu.
² Biogen Inc., Cambridge, MA, USA. pmbutler@bwh.harvard.edu.
³ Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. pmbutler@bwh.harvard.edu.
⁴ Apple Inc., Cupertino, CA, USA.
⁵ Biogen Inc., Cambridge, MA, USA.
⁶ Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
⁷ Intuition Study Scientific Committee, Boston, MA, USA.
⁸ Banner Alzheimer's Institute, Phoenix, AZ, USA.
⁹ University of Pittsburgh School of Nursing, Pittsburgh, PA, USA.
¹⁰ The Balm in Gilead Inc., Richmond, VA, USA.
¹¹ Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
¹² Stanford School of Medicine, Palo Alto, CA, USA.
¹³ Penn State University, University Park, PA, USA.
¹⁴ University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
¹⁵ School of Medicine, Boston University Chobanian and Avedisian, Boston, MA, USA.

PMID: 40038507
PMCID: PMC11922773
DOI: 10.1038/s41591-024-03475-9

Abstract

Consumer-grade mobile devices are used by billions worldwide. Their ubiquity provides opportunities to robustly capture everyday cognition. 'Intuition' was a remote observational study that enrolled 23,004 US adults, collecting 24 months of longitudinal multimodal data via their iPhones and Apple Watches using a custom research application that captured routine device use, self-reported health information and cognitive assessments. The study objectives were to classify mild cognitive impairment (MCI), characterize cognitive trajectories and develop tools to detect and track cognitive health at scale. The study addresses sources of bias in current cognitive health research, including limited representativeness (for example, racial/ethnic, geographic) and accuracy of cognitive measurement tools. We describe study design and provide baseline cohort characteristics. Next, we present foundational proof-of-concept MCI classification modeling results using interactive cognitive assessment data. Initial findings support the reliability and validity of remote MCI detection and the usefulness of such data in describing at-risk cognitive health trajectories in demographically diverse aging populations. ClinicalTrials.gov identifier: NCT05058950 .

PubMed Disclaimer

Conflict of interest statement

Competing interests: The following authors are employed by Apple, Inc.: P.M.B., J.Y., M.H., P.S., S.M., H.H.S., N.S., M.T.B., H.L., H.P. and M.P. The following authors are employed by Biogen, Inc.: P.M.B., R.B., M.H., A.B., J.P.-A., G.C., A.J., P.S.-C., D.R., A.S., G.D., A.G., R.H., S.B. J.B.L. is a full-time employee of Banner Health. Banner Alzheimer’s Institute receives funding from Eli Lilly for its collaborative partnership on TRAILBLAZER-ALZ 3; she reports receiving grants from the NIA unrelated to this project. A.P.P. has received personal fees from Acadia Pharmaceuticals, Athira, BMS, Cognitive Research Corp, IQVIA, Lundbeck, Novartis, ONO Pharmaceuticals, Otsuka, WCG, WebMD and Xenon, and grants to institution from Athira, Biogen, Cassava, Eisai, Eli Lilly, Genentech/Roche, Vaccinex, NIA, NIMH and DOD; he is a member of the Scientific Advisory Board of Alzheon, Athira, and Cognition Therapeutics. Y.T.Q. serves as consultant for Biogen on other unrelated projects. R.A. serves as scientific advisor to Signant Health and NovoNordisk. M.G.E., J.B.L., J.H.L., P.P., Y.T.Q., S.J.S., M.S., A.P.P. and R.A. all served as consultants to Biogen, Inc. on this project as members of the Intuition Study Scientific Committee.

Figures

**Fig. 1. Intuition study enrollment flow.**
The steps from Study App download to complete baseline enrollment are shown stepwise from left to right across two rows, including screening and enrollment totals by stage along the bottom of the diagram. Subjects were required to already own an iPhone to be used in the study and an Apple Watch was provisioned once participants completed baseline enrollment including a 30-min cognitive assessment battery.

**Fig. 2. Twelve-month study adherence for device use and cognitive assessments.**
a, Percentage of contributing participants in the study, defined by those providing iPhone passive data sharing. b–e, Percent adherence by cohort is plotted for Apple Watch (b), monthly CANTAB assessment (c), quarterly HFT assessment (d) and overall passive data adherence by group (e). MCI, pooled MCI self-report and clinically confirmed cases. Contributing participants were defined by participants with screen unlock data from the iPhone over a 12-month period. Sample sizes were n = 17,583 (a), n = 5,552 Control-EM, n = 9,245 Control-L (low/high risk), n = 1,544 SCC, n = 935 MCI-EM and n = 307 MCI (b–e). Participants considered adherent to the HFT was calculated as those who, over the course of the 2-week burst period, had 7 days with at least one assessment completed. iPhone device use was defined by sharing Sensor Kit data evidence by App-Usage data sharing. Apple Watch adherence was defined by passive device data for at least 4 h per day. Numerical values for adherence by group are listed in Supplementary Table 7.

**Fig. 3. Construct validity of Cam-Cog assessments.**
Left, a 207-factor Pearson correlation matrix based on monthly CANTAB and quarterly high-frequency Cam-Cog variables in N = 21,574 baseline participants to complete assessments. Top right, exploratory factor analysis of the key cognitive outcomes for those factors with Kaiser criteria eigenvalues greater than 1.0. Bottom right, CANTAB/Cam-Cog correlations with tele-MoCA scores in participants who underwent tele-research assessment. NBX = N(2)-Back; Tele-MoCA was administered to N = 1,015 participants with concern for cognitive decline with v.8.1, v.8.2 or v.8.3 in the setting of a clinical research interview to evaluate cognitive health. Tele-MoCA scores are reported including the total global score out of 30 points and MoCA-defined subscores as listed. Memory impairment score (MIS) was calculated using established MoCA guidelines and a 15-point score was derived from spontaneous recall, cued recall and multiple-choice cued recall.

**Fig. 4. Baseline subjective and objective cognition by cohort.**
Top left, density plots of MCI versus Controls plus SCC when considering the two dimensions of subjective cognitive concerns and objective cognitive performance. Bottom left, table listing baseline cognition based on self-report and key CANTAB/Cam-Cog outcomes. Top right, group means are plotted with 99% CIs by outcome and with statistical comparisons of baseline cognition by cohort, including two-sided pairwise t-test comparisons of baseline subjective and objective cognition and associated P values. MTS1, MTS 1-box reaction time; MTSa, MTS 8-box search time. Bottom right, two-sided paired t-test comparisons by cohorts of interest. Analysis of variance test for each cognitive outcome listed by cohort was significant with P values < 1 × 10⁻²⁰. Left, all participants aged 50 years and above across cohorts (N = 17,042) and all study participants (N = 23,004). Right, all participants aged 50 years and above across cohorts (N = 17,042).

**Fig. 5. Initial MCI classification model results using baseline cognition.**
MCI* is the combined MCI group comprised of MCI-CC and self-reported MCI confirmed by tele-health versus Controls aged 50–86 years with and without subjective cognitive complaints. Left, baseline characteristics; right, logistic regression MCI classifier accuracy results with ROC curve. All controls aged 50 years and above were included alongside participants with CFI-defined SCC. MCI cases were clinically confirmed cases and self-reported MCI as confirmed by a tele-research visit, including a tele-MoCA to confirm impairment. The MCI classifier is a logistic regression model with ridge penalization (L2 regularization). The model incorporates all baseline CANTAB outcomes (objective cognitive performance measures), along with two subjective cognition surveys: CFI and E-Cog. The model also uses core demographic variables including age, sex and education level. The data were split into 80% for training and 20% for testing. To address class imbalance between the majority and minority classes, training data was resampled using a three-to-one majority-to-minority class ratio. The model was trained using 100× bootstrap resampling in the outer loop to enhance generalization and estimate model stability. Within each bootstrap iteration, a grid search was employed in the inner loop to systematically explore a range of hyperparameters, specifically the regularization strength for ridge penalization, and identify the best-performing hyperparameter configuration. To further ensure robust evaluation, the inner loop applied stratified fivefold crossvalidation, which maintained class balance within each fold while testing different hyperparameter sets. This nested crossvalidation setup ensured that hyperparameter tuning of the model was independent of the outer loop resampling, minimizing the risk of overfitting and optimizing performance on unseen data. Supplementary Table 8 rank orders the most important predictor variables by beta-coefficient values.

**Extended Data Fig. 1. Geographic Diversity in Enrollment across all U.S. States.**
Participant enrollment rate map compared to reference population. This map outlines percentage of the participant population coming from each state. These proportions are compared to the 2022 US Census population proportions for each state. Orange positive valued states are locations where over enrollment occurred relative to the general population levels. In terms of absolute deviation, only a handful of states had noticeable over or under enrollment. California and North Carolina over enrolled by 1.3% and 2.4% respectively compared to their general population. Texas and New York were the largest under enrollers which under enrolled by 1.2% each. The remainder of the states were all within 1% of the reference levels. For a complete list of state-based enrollment and reference population statistics see Online Supplementary Table 2.

**Extended Data Fig. 2. Risk Factor Odds Ratio Connection Plots.**
Each connection plot displays the pairwise odds ratios between binary risk factors. The width of the connections is proportionate to the strength of the association and the colors display the directionality of the odds ratio with blue and orange signifying lower (<1 negative associations) or higher (>1 positive associations) values, respectively. At each node along the edge, the prevalence for each risk factor is cited in reference to the total cohort. The connections between nodes identifies risk factors that are highly associated with each other and likely to occur concomitantly. HTN = hypertension; HLD = hyperlipidemia; T2DM = Type 2 Diabetes Mellitus; CV Dx = cardiovascular disease; FHx = family history of dementia, 1^st degree relative; MHx = Mental health history; TBI = traumatic brain injury. See the Methods section for a complete description of the risk factor definitions. Nicotine refers to any active use of nicotine products. Alcohol refers to a history of heavy consumption, 20 units per week for 10 years or longer. Obesity is defined by BMI ≥ 30. Hearing impairment is any history of hearing issues on the medical review of systems. TBI refers to any history of TBI without regard to severity or frequency. Mental health history is any remote or active mental illness. Cardiovascular disease is endorsement of any of the following: heart attack, atrial fibrillation, angioplasty, stent placement, or endarterectomy, cardiac bypass or other blood vessel bypass grafting procedure, pacemaker and/or defibrillator placement, congestive heart failure, angina, heart valve replacement or repair, or peripheral vascular disease.

**Extended Data Fig. 3. Cognitive Baseline Performance in Controls by Age.**
Quantile curves of baseline cognitive performance on 8 representative measures plotted as smooth functions by age for all study control participants. CANTAB = Cambridge Neuropsychological Test Automated Battery from Cambridge Cognition; PAL = paired associates learning; SWM = spatial working memory (SWMBE46812; SWM Between Errors= errors by selecting boxes already chosen with tokens); PRM = pattern recognition memory; MTS = match-to-sample; All plots A-H show control subjects across the aging lifespan with sample outcomes from CANTAB (a-f) at baseline and burst 1 means for Plots g-h. Density plots are depicted with dashed lines indicating the 10^th and 90^th percentile, solid lines representing the 25^th and 75^th percentile, and with the age-associated inter-quartile range shown with the light gray shading. The median is denoted by the thick solid line. N = 18,845 control participants.

**Extended Data Fig. 4. Cognitive Baseline Performance by Cohort and Age.**
Box and violin plots of baseline cognitive performance for 8 representative measures from participants grouped by age and cohort status. CANTAB = Cambridge Neuropsychological Test Automated Battery from Cambridge Cognition; PAL = Paired Associates Learning; SWM = Spatial Working Memory; SWM Between Errors= errors by selecting boxes already chosen with tokens; PRM = Pattern Recognition Memory; MTS = Match-To-Sample; DSST = Digit Symbol Substitution Test; 2-Back = N-Back. Cohorts displayed include Control-EM = Early and Middle Adulthood; MCI-EM = Mild Cognitive Impairment Early and Middle Adulthood; Control-L = Control Late Adulthood at Low/High Risk for cognitive decline; SCC = Subjective Cognitive Complaint; MCI = mild cognitive impairment self-report; MCI-CC = MCI clinically confirmed. Violin plots show CANTAB baseline outcomes in Panel A-F and high frequency burst outcomes averaged from burst 1 for Panels G-H. The box denotes the interquartile range (IQR), bold line the median, and bold dot the mean. Whiskers extend to the largest/smallest observations no further than 1.5 times the IQR from each side of the box.

**Extended Data Fig. 5. The Intersection of Passive Sensing and Active Assessment of Cognition from Sample MCI and Control Participants.**
Temporal alignment over 12-months from illustrative examples of demographic matched individuals with and without cognitive impairment. Active assessment outcomes from monthly CANTAB demonstrate longitudinal cognitive performance in learning, consolidation and delayed recall, and processing speed. Quarterly burst high frequency assessment outcomes reflect global cognition. Passive sensing of cognition through iPhone typing dynamics reflects cognition deployed in real world settings. Note: All cognitive outcomes are shown with respect to y-axis depicts higher and lower performance in cognition as such. Panel a depicts 12-months of multimodal longitudinal cognitive data in a demographically matched late adulthood control and patient with clinically confirmed amnestic multidomain MCI. Notable trends include baseline deficits in learning and global cognition which worsen over time and then the emergence of new deficits in attention and recall. In the Study App, the participant with MCI reported concern for steadily worsening cognition, suggesting an age-related neurodegenerative process. Panel b depicts a young/middle adulthood participant with cognitive impairment reported in the Study App as an abrupt onset deficit in learning/memory, which was reported as stable/unchanging over time. Baseline deficits in learning are apparent compared to a demographically matched control and remain stable across longitudinal assessments. In the setting of amnestic deficits, one can still appreciate familiarization, learning effects, and improvements in performance on recall, attention, and global cognitive function. Outcomes include Paired Associates Learning (PAL) Total Adjusted Errors, Pattern Recognition Memory Delayed (PRM-d) Recall = percent correct on PRM delayed, Match-To-Sample (MTS) Median Correct Response Time in seconds, Digit Symbol Substitution Test (DSST) Total Number Correct, Passive Cognition = Taps per Minute of Keyboard Characters when Typing Messages on the iPhone.

**Extended Data Fig. 6. Examples of Multimodal Passive Data Collected via iPhone and Apple Watch in a Sample Study Participant.**
Time-aligned physiologic, autonomic, motor, and behavioral data are depicted across 3 panels. *Panel A* shows data related to circadian rhythms, including heart rate (row 1), sleep stages (row 2), step count (row 3) and App-use behavior (row 4). *Panel B* depicts motor speech behavior, including acoustic properties of voice like pitch, jitter, and shimmer (row 1-3 respectively). *Panel C* plots temporal dynamics of typing behaviors from iPhone keyboard use, including press-and-release called hold and flight times. In Panel a, row 1 heart rate collected from Apple Watch PPG (photoplethysmography) sensors, are tagged for periods of (1) activity, (2) quiescence, and (3) poor skin contact or Watch removal. The 30-minute time-aligned rows have light grey backgrounds when the sensors indicate the participant is in bed. Additional insights can be observed when interpreting multiple time aligned data streams, such as heart rate peaks as the number of steps increases and slows during suspected sleep. Panel b illustrates extracted acoustic features of voice from microphone collected data such as frequency adjusted measure of periodicity related to vocal pitch, and moment by moment variations in the fundamental frequency (that is, jitter) and amplitude (that is, shimmer) in vocal intensity. Panel c shows millisecond scale typing dynamics when a user is using the virtual iPhone keyboard. Row 1 shows the time scale of pushing down on the screen to type or ‘D’ for depressing a button and when the individual releases between touches and is up or ‘U.’ Row 2 and 3 show millisecond-scale dynamics for a sample release time (‘U’ or flight time) and button depress (‘D’ or hold time).

**Extended Data Fig. 7. Intuition Study App and Participant Experience.**
The Intuition bespoke Study App was downloaded from the App store by a prospective enrollee who then underwent screening for eligibility and then e-consent. For those consented participants to onboard and orient to the study the experience included familiarization with the App structure and study activities. This multi-panel figure provides sample snapshots of the participant experience.

**Extended Data Fig. 8. Study Data Sampling Cycles and Schedule of Activities.**
Cadence of key onboarding and longitudinal interactive cognitive and self-report assessments over 12 months of study activities.

**Extended Data Fig. 9. Primitive MCI Classification Models Using Demographics, CANTAB Cognitive Performance, Subjective Cognitive Concerns, and a Full Baseline Model.**
ROC = Receiver Operating Curve, AUROC = Area Under the Receiver Operating Curve, CI = Confidence Interval, SCC = Subjective Cognitive Complaints, MCI = Mild Cognitive Impairment. N = 16,790 total participants in the analysis with all controls age 50 years and above were included alongside participants with CFI-defined SCC. MCI cases were those clinically confirmed cases and self-reported MCI that was confirmed by a tele-research visit, including a tele-MoCA to confirm impairment. MCI classifier model was a logistic regression model with ridge penalization (L2) using all baseline CANTAB outcomes (objective) plus 2 subjective cognition surveys (CFI + E-Cog) and core demographics (age, sex, education). Majority-to-minority class sampling was 3:1 with 100x bootstrapping with nested cross-validated.

See this image and copyright information in PMC

References

1. Alfalahi, H. et al. Diagnostic accuracy of keystroke dynamics as digital biomarkers for fine motor decline in neuropsychiatric disorders: a systematic review and meta-analysis. Sci. Rep.12, 7690 (2022). - PMC - PubMed
1. Stroud, C. B., Davila, J. & Moyer, A. The relationship between stress and depression in first onsets versus recurrences: a meta-analytic review. J. Abnorm. Psychol.117, 206–213 (2008). - PubMed
1. Yang, Y. et al. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nat. Med.28, 2207–2215 (2022). - PMC - PubMed
1. Adib, F., Mao, H., Kabelac, Z., Katabi, D. & Miller, R. C. Smart homes that monitor breathing and heart rate. In Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems (eds Begole, B & Kim, J.) 837–846 (Association for Computing Machinery, 2015).
1. Baumeister, H. & Montag, C. Digital Phenotyping and Mobile Sensing (Springer International Publishing, 2019).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in ClinicalTrials.gov

Grants and funding

P30 AG072980/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment

Affiliations

Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Medical