This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Jun 3:2025.05.31.657183.

doi: 10.1101/2025.05.31.657183.

Assessing Attentiveness and Cognitive Engagement across Tasks using Video-based Action Understanding in Non-Human Primates

Sin-Man Cheung¹, Adam Neumann¹, Thilo Womelsdorf^{1

2

3}

Affiliations

¹ Dep. of Psychology, Vanderbilt University, Nashville, TN 37240.
² Vanderbilt Brain Institute Vanderbilt University, Nashville, TN 37240.
³ Dep. of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240.

PMID: 40501819
PMCID: PMC12157403
DOI: 10.1101/2025.05.31.657183

Assessing Attentiveness and Cognitive Engagement across Tasks using Video-based Action Understanding in Non-Human Primates

Sin-Man Cheung et al. bioRxiv. 2025.

[Preprint]. 2025 Jun 3:2025.05.31.657183.

doi: 10.1101/2025.05.31.657183.

Authors

Sin-Man Cheung¹, Adam Neumann¹, Thilo Womelsdorf^{1

2

3}

Affiliations

¹ Dep. of Psychology, Vanderbilt University, Nashville, TN 37240.
² Vanderbilt Brain Institute Vanderbilt University, Nashville, TN 37240.
³ Dep. of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240.

PMID: 40501819
PMCID: PMC12157403
DOI: 10.1101/2025.05.31.657183

Abstract

Background: Distractibility and attentiveness are cognitive states that are expressed through observable behavior. The effective use of behavior observed in videos to diagnose periods of distractibility and attentiveness is still not well understood. Video-based tools for classifying cognitive states from behavior have high potential to serve as versatile diagnostic indicators of maladaptive cognition.

New method: We describe an analysis pipeline that classifies cognitive states using a 2-camera set-up for video-based estimation of attentiveness and screen engagement in nonhuman primates performing cognitive tasks. The procedure reconstructs 3D poses from 2D labeled DeepLabCut videos, reconstructs the head/yaw orientation relative to a task screen, and arm/hand/wrist engagements with task objects, to segment behavior into an attentiveness and engagement score.

Results: Performance of different cognitive tasks were robustly classified from video within a few frames, reaching >90% decoding accuracy with ≤3min time segments. The analysis procedure allows setting subject-specific thresholds for segmenting subject specific movements for a time-resolved scoring of attentiveness and screen engagement.

Comparison with existing methods: Current methods also extract poses and segment action units; however, they haven't been combined into a framework that enables subject-adjusted thresholding for specific task contexts. This integration is needed for inferring cognitive state variables and differentiating performance across various tasks.

Conclusion: The proposed method integrates video segmentation, scoring of attentiveness and screen engagement, and classification of task performance at high temporal resolution. This integrated framework provides a tool for assessing attention functions from video.

PubMed Disclaimer

Figures

**Figure 1 |. Procedural Pipelines.**
(A) Workflow for 3D pose estimation and classification using DeepLabCut and MATLAB/Python. Pose estimation from left and right cameras is processed via DeepLabCut, followed by 2D data extraction, triangulation, and 3D data post-processing in MATLAB. Attentiveness and screen engagement are classified, with results visualized and analyzed using Python/MATLAB. (B) Pipeline for pose estimation of rhesus macaques using DeepLabCut. Video frames (n=4 subjects) are extracted using k-means clustering (20 frames/video, 5 videos/camera, 360 frames/side). Eleven body parts are labeled, and a training dataset is created. The ResNet50 network is trained (30,000 iterations, learning rate 0.005/0.002) until loss plateaus. Outlier frames are extracted, relabeled, and used to export pose estimation for video analysis.

**Figure 2 |. Example Classification of Attentiveness and Screen Engagement.**
(A) Analysis of NHP attentiveness over a 2-minute video segment using yaw angle as the primary feature. The black line represents the yaw angle, with red horizontal lines indicating left and right thresholds for attentiveness. Green bars denote attentive periods (yaw within thresholds), while grey bars indicate inattentive periods (yaw beyond thresholds). Image sequences at frames 640–660 and 2680–2700 show transitions from attentive to inattentive states, while frames 740–760 capture the NHP not being attentive. (B) Screen engagement analysis over a 30-second segment, based on the right wrist’s proximity to the touchscreen. The red horizontal line denotes the engagement threshold. Green bars highlight active engagement (distance below threshold), while distances above the threshold indicate disengagement. Frames 35–40 show the NHP initiating engagement, frames 60–69 depict disengagement, and frames 248–252 capture active screen engagement.

**Figure 3 |. Visualization of Attentiveness and Screen Engagement Scores.**
(A) Schematic of the data processing pipeline for cross-trial analysis, summarizing attentiveness and screen engagement scores (0s and 1s) across frames, either by window size or by task (WM1, M1, EC, M2, WM2). (B) Example attentiveness score over a 5000-frame video segment, showing binary classification (attentive vs. inattentive). (C) Screen engagement score over a 90-minute session, averaged over 300 second windows, with red lines denoting transition between tasks. (D) Mean attentiveness scores per task (n = 31 sessions), with standard error bars representing variability across trials (E) Mean screen engagement scores per task (n = 31 sessions), with standard error bars representing variability across trials.

**Figure 4 |. Task Classification Using Attentiveness and Screen Engagement Metrics.**
(A) Pipeline for task classification (Working Memory [WM], Maze [M], Effort Control [EC]) using attentiveness and screen engagement scores via supervised and unsupervised classifiers. (B) Classification accuracy versus window size, with Random Forest (black), K-Means (green), and chance (red dashed line); red circles mark peak accuracies. (C) Confusion matrix for K-Means at 450-frame window size, with accuracy of 0.575. (D) Feature importance for Random Forest at 540-frame window size, with attentiveness (0.222 ±0.007) and screen engagement (0.778 ±0.007). (E) Confusion matrix for Random Forest at 540-frame window size, with accuracy of 0.910.

See this image and copyright information in PMC

References

1. Azimi M, Oemisch M, Womelsdorf T (2020) Dissociation of nicotinic alpha7 and alpha4/beta2 sub-receptor agonists for enhancing learning and attentional filtering in nonhuman primates. Psychopharmacology (Berl) 237:997–1010. - PubMed
1. Bain M, Nagrani A, Schofield D, Berdugo S, Bessa J, Owen J, Hockings KJ, Matsuzawa T, Hayashi M, Biro D, Carvalho S, Zisserman A (2021) Automated audiovisual behavior recognition in wild primates. Science Advances 7. - PMC - PubMed
1. Bala PC, Eisenreich BR, Yoo SBM, Hayden BY, Park HS, Zimmermann J (2020) Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nature Communications 11. - PMC - PubMed
1. Biderman D et al. (2024) Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling and cloud-native open-source tools. Nat Methods 21:1316–1328. - PMC - PubMed
1. Breiman L (2001) Random Forests. Machine Learning 45:5–32.

Publication types

Actions

Grants and funding

R01 MH129641/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Cold Spring Harbor Laboratory
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Assessing Attentiveness and Cognitive Engagement across Tasks using Video-based Action Understanding in Non-Human Primates

Affiliations

Assessing Attentiveness and Cognitive Engagement across Tasks using Video-based Action Understanding in Non-Human Primates

Authors

Affiliations

Abstract

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

This is a preprint.

Abstract

Figures

Similar articles

References

Publication types

Related information

Grants and funding

LinkOut - more resources

Full Text Sources