Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 27;21(11):3722.
doi: 10.3390/s21113722.

A Driver's Visual Attention Prediction Using Optical Flow

Affiliations

A Driver's Visual Attention Prediction Using Optical Flow

Byeongkeun Kang et al. Sensors (Basel). .

Abstract

Motion in videos refers to the pattern of the apparent movement of objects, surfaces, and edges over image sequences caused by the relative movement between a camera and a scene. Motion, as well as scene appearance, are essential features to estimate a driver's visual attention allocation in computer vision. However, the fact that motion can be a crucial factor in a driver's attention estimation has not been thoroughly studied in the literature, although driver's attention prediction models focusing on scene appearance have been well studied. Therefore, in this work, we investigate the usefulness of motion information in estimating a driver's visual attention. To analyze the effectiveness of motion information, we develop a deep neural network framework that provides attention locations and attention levels using optical flow maps, which represent the movements of contents in videos. We validate the performance of the proposed motion-based prediction model by comparing it to the performance of the current state-of-art prediction models using RGB frames. The experimental results for a real-world dataset confirm our hypothesis that motion plays a role in prediction accuracy improvement, and there is a margin for accuracy improvement by using motion features.

Keywords: convolutional neural networks; driver’s perception modeling; intelligent vehicle system; optical flow; visual attention estimation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Can motion predict a driver’s attention locations? This work is motivated by the fact that attention allocations are greatly influenced by motion in dynamic scenes, not only by the visual appearance of scenes. Given an optical flow map, this work aims to predict a driver’s attention locations/levels, verifying the effectiveness of motion features in the driving context. The details of motion estimation are explained in Section 3.1, and the architecture of the proposed framework is described in Section 3.2.
Figure 2
Figure 2
Examples of the estimated optical flow using the algorithm described in Section 3.1. The estimated dense optical flow is sampled at every 16 pixel for each dimension.
Figure 3
Figure 3
Overview of the network. The network takes optical flow maps as an input and outputs the predicted corresponding attention maps. Four sequential sub-networks represent features at different resolutions. These multi-scale features from sub-networks are fused across layers, and the high-resolution features of the original input size are maintained instead of being reduced.
Figure 4
Figure 4
Analysis of the proportion of the frames each model performs better with different environmental conditions. (a) Locations (countryside, highway, and downtown). The average velocity of the scenes of countryside, highway, and downtown are 50.57 km/h, 78.95 km/h, and 26.90 km/h, respectively. (b) Weather conditions (rainy, cloudy, and sunny). The average velocity of the scenes of rainy, cloudy, and sunny days are 47.51 km/h, 50.35 km/h, and 51.09 km/h, respectively.
Figure 5
Figure 5
Examples of predicted attention maps. The output attention maps overlay with the input images and scale to RGB components. (a) Input Image. (b) Optical flow map. (c) Ground-truth attention map. (d) Attention map of the state-of-art model using RGB frames. (e) Attention map of the proposed model using optical flow. (f) Input Image. (g) Optical flow map. (h) Ground-truth attention map. (i) Attention map of the state-of-art model using RGB frames. (j) Attention map of the proposed model using optical flow. The proposed model using optical flow can also detect contextual and pixel-wise labels consistently and accurately, comparable to the model using RGB frames.
Figure 6
Figure 6
Visual analysis of failure cases using RGB frames. (a) Input Image. (b) Optical flow map. (c) Ground-truth attention map. (d) Attention map of the state-or-art model using RGB frames. (e) Attention map of the proposed model using optical flow. These failure cases imply that the networks for a driver’s attention prediction can easily learn attention regions nearby vanishing points.
Figure 7
Figure 7
Visual analysis of failure cases using optical flow maps. (a) Input Image. (b) Optical flow map. (c) Ground-truth attention map. (d) Attention map of the state-or-art model using RGB frames. (e) Attention map of the proposed model using optical flow. These failure cases imply that the proposed network can also learn unsteady small variations in the surrounding environment.

References

    1. Ohn-Bar E., Trivedi M.M. Are All Objects Equal? Deep Spatio-temporal Importance Prediction in Driving Videos. Pattern Recognit. 2017;64:425–436. doi: 10.1016/j.patcog.2016.08.029. - DOI
    1. Schwarting W., Pierson A., Alonso-Mora J., Karaman S., Rus D. Social Behavior for Autonomous Vehicles. Proc. Natl. Acad. Sci. USA. 2019;116:24972–24978. doi: 10.1073/pnas.1820676116. - DOI - PMC - PubMed
    1. Kim I.H., Bong J.H., Park J., Park S. Prediction of Driver’s Intention of Lane Change by Augmenting Sensor Information Using Machine Learning Techniques. Sensors. 2017;17:1350. doi: 10.3390/s17061350. - DOI - PMC - PubMed
    1. Martínez-García M., Gordon T. A New Model of Human Steering Using Far-point Error Perception and Multiplicative Control; Proceedings of the IEEE International Conference on Systems; Man, and Cybernetics, Miyazaki, Japan. 7–10 October 2018; pp. 1245–1250.
    1. Chang S., Zhang Y., Zhang F., Zhao X., Huang S., Feng Z., Wei Z. Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor. Sensors. 2020;20:956. doi: 10.3390/s20040956. - DOI - PMC - PubMed

LinkOut - more resources