Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 4;5(9):eaaw0736.
doi: 10.1126/sciadv.aaw0736. eCollection 2019 Sep.

Chimpanzee face recognition from videos in the wild using deep learning

Affiliations

Chimpanzee face recognition from videos in the wild using deep learning

Daniel Schofield et al. Sci Adv. .

Abstract

Video recording is now ubiquitous in the study of animal behavior, but its analysis on a large scale is prohibited by the time and resources needed to manually process large volumes of data. We present a deep convolutional neural network (CNN) approach that provides a fully automated pipeline for face detection, tracking, and recognition of wild chimpanzees from long-term video records. In a 14-year dataset yielding 10 million face images from 23 individuals over 50 hours of footage, we obtained an overall accuracy of 92.5% for identity recognition and 96.2% for sex recognition. Using the identified faces, we generated co-occurrence matrices to trace changes in the social network structure of an aging population. The tools we developed enable easy processing and annotation of video datasets, including those from other species. Such automated analysis unveils the future potential of large-scale longitudinal video archives to address fundamental questions in behavior and conservation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Fully unified pipeline for wild chimpanzee face tracking and recognition from raw video footage.
The pipeline consists of the following stages: (A) Frames are extracted from raw video. (B) Detection of faces is performed using a deep CNN single-shot detector (SSD) model. (C) Face tracking, which is implemented using a Kanade-Lucas-Tomasi (KLT) tracker (25) to group detections into face tracks. (D) Facial identity and sex recognition, which are achieved through the training of deep CNN models. (E) The system only requires the raw video as input and produces labeled face tracks and metadata as temporal and spatial information. (F) This output from the pipeline can then be used to support, for example, social network analysis. (Photo credit: Kyoto University, Primate Research Institute)
Fig. 2
Fig. 2. Face recognition results demonstrating the CNN model’s robustness to variations in pose, lighting, scale, and age over time.
(A) Example of a correctly labeled face track. The first two faces (nonfrontal) were initially labeled incorrectly by the model but were corrected automatically by recognition of the other faces in the track, demonstrating the benefit of our face track aggregation approach. (B) Examples of chimpanzee face detections and recognition results in frames extracted from raw video. Note how the system has achieved invariance to scale and is able to perform identification despite extreme poses and occlusions from vegetation and other individuals. (C) Examples of correctly identified faces for two individuals. The individuals age 12 years from left to right (top row: from 41 to 53 years; bottom row: from 2 to 14 years). Note how the model can recognize extreme profiles, as well as faces with motion blur and lighting variations. (Photo credit: Kyoto University, Primate Research Institute)
Fig. 3
Fig. 3. Face detection and recognition results.
(A) Histograms of detection numbers for individuals in the training and test years of the dataset (2000, 2004, 2006, 2008, 2012, and 2013). (B) Output of model for number of individuals detected in each year and proportion of individuals in different age categories based on existing estimates of individual ages.
Fig. 4
Fig. 4. Social networks of the Bossou community generated from co-occurrence matrices constructed using detections of the face recognition model.
Each node represents an individual chimpanzee. Node size corresponds to the individual’s degree centrality—the total number of “edges” (connections) they have (the higher the degree centrality, the larger the node). Node colors correspond to subclusters of the community as identified independently in each year using the Louvain community detection algorithm (23). Individuals whose ID codes begin with the same letter belong to the same matriline; IDs in capital letters correspond to males, while IDs with only the first letter capitalized correspond to females (see table S1). Within these clusters, as predicted, mothers and young infants have the strongest co-occurrences, and kin cluster into the same subgroups.
Fig. 5
Fig. 5. Preliminary results from the face detector model tested on other primate species.
Top row: P. troglodytes schweinfurthii, Pan paniscus, Gorilla beringei, Pongo pygmaeus, Hylobates muelleri, and Cebus imitator. Bottom row: Papio ursinus (x2), Chlorocebus pygerythrus (x2), Eulemur macaco, and Nycticebus coucang. Image sources: Chimpanzee: www.youtube.com/watch?v=c2u3NKXbGeo; Bonobo: www.youtube.com/watch?v=JF8v_HWvfLc&t=9s; Gorilla: www.youtube.com/watch?v=wDECqJsiGqw&t=28s; Orangutan: www.youtube.com/watch?v=Gj2W5BHu-SI;Gibbon: www.youtube.com/watch?v=C6HucIWKsVc;Capuchin: Lynn Lewis-Bevan (personal data); Baboon: Lucy Baehren (personal data); Vervet monkey: Lucy Baehren (personal data); Loris: www.youtube.com/watch?v=2Syd_BUbl5A&t=2s.

References

    1. Caravaggi A., Banks P. B., Burton A. C., Finlay C. M. V., Haswell P. M., Hayward M. W., Rowcliffe M. J., Wood M. D., A review of camera trapping for conservation behaviour research. Remote Sens. Ecol. Conserv. 3, 109–122 (2017).
    1. T. Nishida, K. Zamma, T. Matsusaka, A. Inaba, W. C. McGrew, Chimpanzee Behavior in the Wild: An Audio-Visual Encyclopedia (Springer Science & Business Media, 2010).
    1. Clutton-Brock T., Sheldon B. C., Individuals and populations: The role of long-term, individual-based studies of animals in ecology and evolutionary biology. Trends Ecol. Evol. 25, 562–573 (2010). - PubMed
    1. Swanson A., Kosmala M., Lintott C., Simpson R., Smith A., Packer C., Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci. Data 2, 150026 (2015). - PMC - PubMed
    1. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016), vol. 1.

Publication types