Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 16;21(22):7604.
doi: 10.3390/s21227604.

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Affiliations

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Mai S Diab et al. Sensors (Basel). .

Abstract

The human brain can effortlessly perform vision processes using the visual system, which helps solve multi-object tracking (MOT) problems. However, few algorithms simulate human strategies for solving MOT. Therefore, devising a method that simulates human activity in vision has become a good choice for improving MOT results, especially occlusion. Eight brain strategies have been studied from a cognitive perspective and imitated to build a novel algorithm. Two of these strategies gave our algorithm novel and outstanding results, rescuing saccades and stimulus attributes. First, rescue saccades were imitated by detecting the occlusion state in each frame, representing the critical situation that the human brain saccades toward. Then, stimulus attributes were mimicked by using semantic attributes to reidentify the person in these occlusion states. Our algorithm favourably performs on the MOT17 dataset compared to state-of-the-art trackers. In addition, we created a new dataset of 40,000 images, 190,000 annotations and 4 classes to train the detection model to detect occlusion and semantic attributes. The experimental results demonstrate that our new dataset achieves an outstanding performance on the scaled YOLOv4 detection model by achieving a 0.89 mAP 0.5.

Keywords: data association; dataset; deep learning; multiple object tracking; semantic attribute.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure A1
Figure A1
(a) how to annotate a trouser without including part of the background, (b) the all shirt should be included inside the Bbox.
Figure A2
Figure A2
(a) don’t annotate the trouser if only a small part is shown, (b) Annotate the shirt and trouser, even if they appeared in the background.
Figure A3
Figure A3
(a) shows how to annotate a shirt if part of it was occluded behind another person, (b) shows more crowded scene and how to annotate occluded objects.
Figure A4
Figure A4
In (a) and (b), there is a false positive Bbox in the right corner of the shirt.
Figure A5
Figure A5
(a) shorts should be annotated as trouser, (b) class shirt means any clothes in the upper body part.
Figure A6
Figure A6
In (a) and (b), You can cut part of the shirt if it includes another person’s face.
Figure 1
Figure 1
Overview of MSA-AF starting from the detection step, which feeds into the first association step. The final association step will be used in cases of occlusion. Track 1 to Track n represents the model of the object that has been tracked in the previous frames. A class could be 0, 1, 2, or 3, representing man, woman, shirt, and trousers. Bbox has x,y,h,w of the object’s bounding box, in which (x,y) represents the centre point, h is the height, and w is the width. The occlusion state is a binary number: ‘1’ if the object is in an occlusion state and ‘0’ if not. Occluded with has the ID of the object that was occluded. Age is the number of frames in which the object is successfully tracked. Finally, Semantic information includes much information that will be described in detail in Section 3. The original image is taken from the MOTChalleng dataset [28].
Figure 2
Figure 2
Steps that have been used to assert a high-performance detection model. Each phase included all options that we used for the comparison in order to get the best detection results.
Figure 3
Figure 3
Images sample taken randomly from our dataset by Scaled Yolov4 to use in the training step; we did not choose any of them. These images prove the diversity of the PGC dataset, and it contains women, men, Arab women, Asian women, Black men, crowded scene, a single person scene, bluer scene, indoor scene, outdoor scene, back view, and front view.
Figure 4
Figure 4
PGC class annotation balance.
Figure 5
Figure 5
Timeline and performance comparison between members of the YOLO family.
Figure 6
Figure 6
A combination of distributed-only and semantic hub for controlling the relationship between attributes.
Figure 7
Figure 7
Four images taken from our detection outputs to demonstrate the occlusion phases. The original image is taken from the MOTChalleng dataset [28]. Before occlusion state (a), in occlusion state (b,c), and after occlusion state (d).
Figure 8
Figure 8
Our dataset loss plots for the training and validation sets.
Figure 9
Figure 9
Comparison of class representation between the PGC dataset and open v6.
Figure 10
Figure 10
Four evaluation metrics, mAP 0.5–0.95 (a), mAP 0.5 (b), Recall (c), and Precision (d), that have been used to measure the performance of the PGC dataset and open dataset v6, after being used to train scaled YOLOv4. TensorBoard was used to create these graphics.
Figure 11
Figure 11
The performance metrics Precision, Recall, f1, and Precision-Recall of the four classes when the PGC dataset had been used to train YOLOv5. The YOLOv5 model was used to create these graphics.
Figure 12
Figure 12
Performance of the four models, Scaled YOLOv4, YOLOv5l, YOLOv5s, and YOLOv5m using two metrics, mAP 0.5 (a) and mAP 0.5–0.95 (b). TensorBoard was used to create these graphics.
Figure 13
Figure 13
A sample from the MOT20-04 video [28].

References

    1. Leonardelli E., Fait E., Fairhall S.L. Temporal dynamics of access to amodal representations of category-level conceptual information. Sci. Rep. 2019;9:239. doi: 10.1038/s41598-018-37429-2. - DOI - PMC - PubMed
    1. Lyu C., Hu S., Wei L., Zhang X., Talhelm T. Brain Activation of Identity Switching in Multiple Identity Tracking Task. PLoS ONE. 2015;10:e0145489. doi: 10.1371/journal.pone.0145489. - DOI - PMC - PubMed
    1. Rupp K., Roos M., Milsap G., Caceres C., Ratto C., Chevillet M., Crone N.E., Wolmetz M. Semantic attributes are encoded in human electrocorticographic signals during visual object recognition. NeuroImage. 2017;148:318–329. doi: 10.1016/j.neuroimage.2016.12.074. - DOI - PubMed
    1. Ardila A. People recognition: A historical/anthropological perspective. Behav. Neurol. 1993;6:99–105. doi: 10.1155/1993/169342. - DOI - PubMed
    1. Hong Z., Chen Z., Wang C., Mei X., Prokhorov D., Tao D. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7–12 June 2015; pp. 749–758. - DOI

LinkOut - more resources