Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 May 30:2023.05.30.540066.
doi: 10.1101/2023.05.30.540066.

An Open-Source Tool for Automated Human-Level Circling Behavior Detection

Affiliations

An Open-Source Tool for Automated Human-Level Circling Behavior Detection

O R Stanley et al. bioRxiv. .

Update in

Abstract

Quantifying behavior and relating it to underlying biological states is of paramount importance in many life science fields. Although barriers to recording postural data have been reduced by progress in deep-learning-based computer vision tools for keypoint tracking, extracting specific behaviors from this data remains challenging. Manual behavior coding, the present gold standard, is labor-intensive and subject to intra- and inter-observer variability. Automatic methods are stymied by the difficulty of explicitly defining complex behaviors, even ones which appear obvious to the human eye. Here, we demonstrate an effective technique for detecting one such behavior, a form of locomotion characterized by stereotyped spinning, termed 'circling'. Though circling has an extensive history as a behavioral marker, at present there exists no standard automated detection method. Accordingly, we developed a technique to identify instances of the behavior by applying simple postprocessing to markerless keypoint data from videos of freely-exploring (Cib2-/-;Cib3-/-) mutant mice, a strain we previously found to exhibit circling. Our technique agrees with human consensus at the same level as do individual observers, and it achieves >90% accuracy in discriminating videos of wild type mice from videos of mutants. As using this technique requires no experience writing or modifying code, it also provides a convenient, noninvasive, quantitative tool for analyzing circling mouse models. Additionally, as our approach was agnostic to the underlying behavior, these results support the feasibility of algorithmically detecting specific, research-relevant behaviors using readily-interpretable parameters tuned on the basis of human consensus.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Data Collection Conditions and Analysis Pipeline
We collected videos of five wild-type and five (Cib2−/−;Cib3−/−) dual knockout mice exploring a 30cm-diameter cylindrical arena. Each of 6 combinations of light and distance conditions was repeated 4 times for each mouse, resulting in a total of 240 videos. After behavior videos were recorded, 6 videos (one mutant mouse video from each recording condition) were set aside for human behavioral labeling. For each of these held-out videos, three experimenters independently marked occurrences of circling behavior. These behavioral labels were compared and discussed to produce a set of consensus labels. Positions of the snout and tailbase were manually labeled in 20 random frames from each of the remaining 234 videos (4680 total labeled frames). Manually labeled bodypart locations were used to train a computer vision model using DeepLabCut. This trained model was then used to track animals in the 6 held-out validation videos, and the resulting paths were analyzed by three candidate circling detection algorithms. The behavioral occurrence labels produced by these algorithms were compared to human consensus labels to assess performance.
Figure 2
Figure 2. Human F1 Scores
A) Using independent observers’ labels as ground-truth for one another demonstrates that human coders vary widely in levels of agreement. For each paired set of human-labeled occurrences, one was treated as ground truth, and the other was scored against it. One pair of labelers showed strong agreement while the other two possible pairings did not. This resulted in F1 scores of 0.79 (95% confidence interval 0.76 – 0.81), 0.5 (95% CI 0.4 – 0.56), and 0.5 (95% CI 0.42 – 0.52). F1 score distributions were generated by summing counts of true-positives, false-positives, and false-negatives for each possible combination (with replacement) of the six validation videos. B) Distribution of independent labeler F1 scores versus consensus labels (0.785, 95% CI 0.696 – 0.853). F1 score distribution was generated by bootstrapping 50,000 times from counts of true-positives, false-positives, and false-negatives among the eighteen possible experimenter-video combinations.
Figure 3
Figure 3. Method Parameters & Performance Levels
A) Example DLC-labeled frames, showing trailing labeled points. B) Illustration of circle detection using each of the described methods. Duration-Only considers only time taken to complete the circle, Swept-Angle additionally calculates the angle of the tail-to-snout vector for each frame and considers both its total movement and its ‘consistency’ (fraction of time moving in the predominant direction), and Aspect-Angle constraints the geometry of the circle based on the aspect ratio and minor axis length of the circle’s minimum bounding rectangle. C) Heatmaps of F1 scores when varying two of each methods’ two parameters. Maximum F1 scores are indicated by stars; 0.66 (min duration 0.4 sec, max duration 1.45), 0.7 (0.35 – 1.45 second duration, 56% min consistency, min rotation of 36 degrees), and 0.74 (same best parameters as Swept-Angle, plus min axis size of 0.5 bodylengths and max aspect ratio of 2.5), respectively. D) Examples of false-positive detection using each method in one validation video.
Figure 4
Figure 4. Cross-Method Performance Comparison
Each column contains weighted-average performance metrics (precision: blue triangles, left; F1 score: teal circles, center; recall, yellow squares, right) across the validation video set using the Full Dataset model. Large markers represent overall scores while small dots represent individual videos. Notably, in contrast to the results from independent human labels, each automatic detection method produces higher recall than precision.
Figure 5
Figure 5. Comparison of Performance Metrics Across Networks
A) Performance metrics for each trained network, using individually-optimized Box-Angle method parameters, alongside performance metrics for independent human screening. Each column includes precision (left, blue triangles), F1 score (center, teal circles), and recall (right, yellow squares). B) Bootstrapped F1 scores for independent human labeling (cyan) versus automatic labeling via Box-Angle method by each model, all scored against human consensus labels: Single Video model (green), Multi-Animal model (purple), and Multi-Condition model (orange), and Full Dataset model (magenta). Dotted vertical lines represent bounds of 95% confidence intervals.
Figure 6
Figure 6. Mouse classification performance metrics.
A) Videos of wild-type mice contained on average less than one circling instance per ten minutes of footage (green) while (Cib2−/−;Cib3−/−) mutant videos contained on average more than fifty instances of circling per minute (orange). B) F1 score for our final algorithm when discriminating between dual knockout and wildtype mouse videos, as a function of circling frequency. Using a simple threshold classifying any video with at least 0.95 circling instances per minute as a mutant (dotted line) produced an F1 score of ~0.9. Notably, setting any requirement for minimum circling per minute substantially outperforms chance (chance level = 0.66, floor of y-axis).

References

    1. Segalin C. et al. The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. Elife 10, (2021). - PMC - PubMed
    1. van den Boom B. J. G., Pavlidi P., Wolf C. J. H., Mooij A. H. & Willuhn I. Automated classification of self-grooming in mice using open-source software. J. Neurosci. Methods 289, 48–56 (2017). - PubMed
    1. Ziegler L. von, von Ziegler, L., Sturman, O. & Bohacek, J. Big behavior: challenges and opportunities in a new era of deep behavior profiling. Neuropsychopharmacology vol. 46 33–44 Preprint at 10.1038/s41386-020-0751-7 (2021). - DOI - PMC - PubMed
    1. Mathis A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018). - PubMed
    1. Ono K. et al. Retinoic acid degradation shapes zonal development of vestibular organs and sensitivity to transient linear accelerations. Nat. Commun. 11, 63 (2020). - PMC - PubMed

Publication types