Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 11;11(1):4560.
doi: 10.1038/s41467-020-18441-5.

Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio

Affiliations

Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio

Praneet C Bala et al. Nat Commun. .

Abstract

The rhesus macaque is an important model species in several branches of science, including neuroscience, psychology, ethology, and medicine. The utility of the macaque model would be greatly enhanced by the ability to precisely measure behavior in freely moving conditions. Existing approaches do not provide sufficient tracking. Here, we describe OpenMonkeyStudio, a deep learning-based markerless motion capture system for estimating 3D pose in freely moving macaques in large unconstrained environments. Our system makes use of 62 machine vision cameras that encircle an open 2.45 m × 2.45 m × 2.75 m enclosure. The resulting multiview image streams allow for data augmentation via 3D-reconstruction of annotated images to train a robust view-invariant deep neural network. This view invariance represents an important advance over previous markerless 2D tracking approaches, and allows fully automatic pose inference on unconstrained natural motion. We show that OpenMonkeyStudio can be used to accurately recognize actions and track social interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests

Figures

Fig. 1
Fig. 1. Pose detection architecture.
A multi-stage convolutional pose machine is used to detect body landmarks of macaque from an image. It takes as an input a 368 × 368 × 3 image (368 × 368 resolution with three color channels) and outputs 46 × 46 × 14 response maps (13 landmarks and one background) where the location of maximum response corresponds to the landmark location. The detected land-marks from multiview images are triangulated in 3D given the camera calibration. To train the generalizable view-invariant pose detector, multiview geometry is used to substantially augment the data via 3D reconstruction, which allows learning a view-invariant pose detector.
Fig. 2
Fig. 2. Representative tracking results.
A markerless motion capture system called OpenMonkeyStudio is designed to reconstruct 13 body landmarks in 3D. The system with 62 cameras that encircle a large open space synchronously captures a macaque’s movement from diverse vantage points. Colored traces correspond to joint labels. The multiview images and four arbitrary cropped images superimposed with the projection of the reconstruction are shown.
Fig. 3
Fig. 3. Pose tracking validation.
The head location reconstructed by OpenMonkeyStudio is compared with a marker-based motion capture system (OptiTrack) over time. The marker-based system produces noisy measurements due to the marker confusion, which requires an additional manual refinement. The median error is 6.76 cm. The images overlaid with the projection of the 3D reconstruction show visual validity.
Fig. 4
Fig. 4. Augmentation and inference performance.
a View augmentation improves the accuracy of reconstruction. A subset of cameras are used for view augmentation and the relative accuracy is measured by comparing to the full model that is trained by 62 camera augmentation (m = 62). While the landmarks that are relatively rigid shape such as nose, head, neck, and hip can produce accurate reconstruction with small augmentation, the limb landmarks such as hands, knees, and feet require greater augmentation. The overall accuracy is improved from 34% (m = 1) to 76% (m = 48), which justifies the multiview augmentation. b Once the detection model is trained with the full view augmentation (m = 62), a subset of cameras can be used to achieve comparable performance. The relative accuracy is measured by comparing to n = 62. For instance, eight cameras can achieve 80% overall performance. However, the limbs with high degrees of freedom such as hands, knees, and feet require more cameras to reach comparable levels.
Fig. 5
Fig. 5. View-dependent accuracy.
The inference precision of landmark detection is view dependent. For the head (a) that is visible from most views, the precision is nearly uniform across views under the valid inference range (two pixel error, n = 344 images). In contrast, the right hand (b) is often occluded by the torso when seen from the cameras on the left-hand side of the macaque. The tail (c) corresponds more to the head inference with a slight degradation when the macaque is seen from head on. This results in non-uniform precision, i.e., the inference from the views on the right-hand side is more reliable than the other side. Boxplots represent median and 25th and 75th percentile respectively while whiskers extend to extrema.
Fig. 6
Fig. 6. Semantic action detection.
We use the 3D pose representation to recognize semantic actions (standing, walking, climbing, climbing supine, sitting, and jumping). a The poses are clustered by using UMAP. Each cluster that is represented by 3D poses (side and top views) is highly correlated with the semantic actions. b With the clusters, we recognize actions in a new testing sequence using the k nearest neighbor search and visualize the transitions among the semantic actions. c In contrast, the 2D representation provides the clusters that are driven by the pose and viewpoint. For instance, while the 3D representation of walking is one continuous cluster, the 2D representation is broken apart into discrete groupings of repeated poses at different spatial locations.
Fig. 7
Fig. 7. Social interaction tracking.
a OpenMonkeyStudio extends to tracking social interactions in non-human primates. Here we demonstrate the feasibility of tracking two rhesus macaque while they individually move inside the enclosure, crossing paths fully. Colors indicate two individuals. Top frames depict the scene of two individuals in the cage during different timepoints. b Co-occurrence of actions in social macaques. We used 3D poses to classify their actions to illustrate the co-occurrence of actions of two macaques in log scale. c Proxemics characterizes the social space, e.g., how the location of a macaque is distributed with respect to the other. We transformed the 3D coordinate of the second macaque to the first macaque’s body centric coordinate system, i.e., 0 represents the first macaque’s facing direction. We use the polar histogram of the transformed coordinate to visualize the proxemics of macaques.
Fig. 8
Fig. 8. Dataset.
We will make OpenMonkeyPose dataset, the trained detection model, and the training code publicly available. The dataset includes 195,228 annotated pose instances associated with diverse activities.
Fig. 9
Fig. 9. Multiview augmentation.
OpenMonkeyStudio leverages multiview geometry to augment the annotated data across views. The three images in the left most column are manually annotated and the 2D pose of the rest images are automatically augmented by 3D reconstruction and its projection.
Fig. 10
Fig. 10. System architecture.
a OpenMonkeyStudio integrates 62 cameras into a large space (2.45 × 2.45 × 2.75 m) that allows unconstrained movement of macaques. These cameras face at the center of space, which is ideal for view augmentation and reconstruction. b System configuration of distributed image acquisition. Sixty-two cameras are connected to the local servers through 10 Gb network switches, and six local servers are controlled by a global server. The cameras are triggered by the external clock, which allows synchronization.

References

    1. Gibbs RA, et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007;316:222–234. doi: 10.1126/science.1139247. - DOI - PubMed
    1. Kessler MJ, Berard JD, Rawlins RG. Effect of tetanus toxoid inoculation on mortality in the Cayo Santiago macaque population. Am. J. Primatol. 1988;15:93–101. doi: 10.1002/ajp.1350150203. - DOI - PubMed
    1. Hanson, E. in Centennial History of the Carnegie Institution of Washington (Book 5) vol. 5 (Cambridge University Press, 2004).
    1. Talakoub, O. et al. Hippocampal and neocortical oscillations are tuned to behavioral state in freely-behaving macaques. bioRxiv10.1101/552877 (2019).
    1. Shahidi, N., Schrater, P., Wright, T., Pitkow, X. & Dragoi, V. Population coding of strategic variables during foraging in freely-moving macaques. bioRxiv10.1101/811992 (2019). - PMC - PubMed

Publication types