. 2021 Nov 16;21(22):7604.

doi: 10.3390/s21227604.

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Mai S Diab^{1

2}, Mostafa A Elhosseini^{3

4}, Mohamed S El-Sayed¹, Hesham A Ali^{3

5}

Affiliations

¹ Faculty of Computer & Artificial Intelligence, Benha University, Benha 13511, Egypt.
² Intoolab Ltd., London WC2H 9JQ, UK.
³ Computers Engineering and Control System, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt.
⁴ College of Computer Science and Engineering in Yanbu, Taibah University, Madinah 46421, Saudi Arabia.
⁵ Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 35511, Egypt.

PMID: 34833680
PMCID: PMC8625767
DOI: 10.3390/s21227604

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Mai S Diab et al. Sensors (Basel). 2021.

. 2021 Nov 16;21(22):7604.

doi: 10.3390/s21227604.

Authors

Mai S Diab^{1

2}, Mostafa A Elhosseini^{3

4}, Mohamed S El-Sayed¹, Hesham A Ali^{3

5}

Affiliations

¹ Faculty of Computer & Artificial Intelligence, Benha University, Benha 13511, Egypt.
² Intoolab Ltd., London WC2H 9JQ, UK.
³ Computers Engineering and Control System, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt.
⁴ College of Computer Science and Engineering in Yanbu, Taibah University, Madinah 46421, Saudi Arabia.
⁵ Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 35511, Egypt.

PMID: 34833680
PMCID: PMC8625767
DOI: 10.3390/s21227604

Abstract

The human brain can effortlessly perform vision processes using the visual system, which helps solve multi-object tracking (MOT) problems. However, few algorithms simulate human strategies for solving MOT. Therefore, devising a method that simulates human activity in vision has become a good choice for improving MOT results, especially occlusion. Eight brain strategies have been studied from a cognitive perspective and imitated to build a novel algorithm. Two of these strategies gave our algorithm novel and outstanding results, rescuing saccades and stimulus attributes. First, rescue saccades were imitated by detecting the occlusion state in each frame, representing the critical situation that the human brain saccades toward. Then, stimulus attributes were mimicked by using semantic attributes to reidentify the person in these occlusion states. Our algorithm favourably performs on the MOT17 dataset compared to state-of-the-art trackers. In addition, we created a new dataset of 40,000 images, 190,000 annotations and 4 classes to train the detection model to detect occlusion and semantic attributes. The experimental results demonstrate that our new dataset achieves an outstanding performance on the scaled YOLOv4 detection model by achieving a 0.89 mAP 0.5.

Keywords: data association; dataset; deep learning; multiple object tracking; semantic attribute.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure A1**
(a) how to annotate a trouser without including part of the background, (b) the all shirt should be included inside the Bbox.

**Figure A2**
(a) don’t annotate the trouser if only a small part is shown, (b) Annotate the shirt and trouser, even if they appeared in the background.

**Figure A3**
(a) shows how to annotate a shirt if part of it was occluded behind another person, (b) shows more crowded scene and how to annotate occluded objects.

**Figure A4**
In (a) and (b), there is a false positive Bbox in the right corner of the shirt.

**Figure A5**
(a) shorts should be annotated as trouser, (b) class shirt means any clothes in the upper body part.

**Figure A6**
In (a) and (b), You can cut part of the shirt if it includes another person’s face.

**Figure 1**
Overview of MSA-AF starting from the detection step, which feeds into the first association step. The final association step will be used in cases of occlusion. Track 1 to Track n represents the model of the object that has been tracked in the previous frames. A class could be 0, 1, 2, or 3, representing man, woman, shirt, and trousers. Bbox has x,y,h,w of the object’s bounding box, in which (x,y) represents the centre point, h is the height, and w is the width. The occlusion state is a binary number: ‘1’ if the object is in an occlusion state and ‘0’ if not. Occluded with has the ID of the object that was occluded. Age is the number of frames in which the object is successfully tracked. Finally, Semantic information includes much information that will be described in detail in Section 3. The original image is taken from the MOTChalleng dataset [28].

**Figure 2**
Steps that have been used to assert a high-performance detection model. Each phase included all options that we used for the comparison in order to get the best detection results.

**Figure 3**
Images sample taken randomly from our dataset by Scaled Yolov4 to use in the training step; we did not choose any of them. These images prove the diversity of the PGC dataset, and it contains women, men, Arab women, Asian women, Black men, crowded scene, a single person scene, bluer scene, indoor scene, outdoor scene, back view, and front view.

**Figure 4**
PGC class annotation balance.

**Figure 5**
Timeline and performance comparison between members of the YOLO family.

**Figure 6**
A combination of distributed-only and semantic hub for controlling the relationship between attributes.

**Figure 7**
Four images taken from our detection outputs to demonstrate the occlusion phases. The original image is taken from the MOTChalleng dataset [28]. Before occlusion state (a), in occlusion state (b,c), and after occlusion state (d).

**Figure 8**
Our dataset loss plots for the training and validation sets.

**Figure 9**
Comparison of class representation between the PGC dataset and open v6.

**Figure 10**
Four evaluation metrics, mAP 0.5–0.95 (a), mAP 0.5 (b), Recall (c), and Precision (d), that have been used to measure the performance of the PGC dataset and open dataset v6, after being used to train scaled YOLOv4. TensorBoard was used to create these graphics.

**Figure 11**
The performance metrics Precision, Recall, f1, and Precision-Recall of the four classes when the PGC dataset had been used to train YOLOv5. The YOLOv5 model was used to create these graphics.

**Figure 12**
Performance of the four models, Scaled YOLOv4, YOLOv5l, YOLOv5s, and YOLOv5m using two metrics, mAP 0.5 (a) and mAP 0.5–0.95 (b). TensorBoard was used to create these graphics.

**Figure 13**
A sample from the MOT20-04 video [28].

See this image and copyright information in PMC

References

1. Leonardelli E., Fait E., Fairhall S.L. Temporal dynamics of access to amodal representations of category-level conceptual information. Sci. Rep. 2019;9:239. doi: 10.1038/s41598-018-37429-2. - DOI - PMC - PubMed
1. Lyu C., Hu S., Wei L., Zhang X., Talhelm T. Brain Activation of Identity Switching in Multiple Identity Tracking Task. PLoS ONE. 2015;10:e0145489. doi: 10.1371/journal.pone.0145489. - DOI - PMC - PubMed
1. Rupp K., Roos M., Milsap G., Caceres C., Ratto C., Chevillet M., Crone N.E., Wolmetz M. Semantic attributes are encoded in human electrocorticographic signals during visual object recognition. NeuroImage. 2017;148:318–329. doi: 10.1016/j.neuroimage.2016.12.074. - DOI - PubMed
1. Ardila A. People recognition: A historical/anthropological perspective. Behav. Neurol. 1993;6:99–105. doi: 10.1155/1993/169342. - DOI - PubMed
1. Hong Z., Chen Z., Wang C., Mei X., Prokhorov D., Tao D. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7–12 June 2015; pp. 749–758. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Affiliations

Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources