Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Jieneng Chen¹, Jingye Chen², Ruiming Zhang¹, Xiaobin Hu³

Affiliations

¹ Department of Computer Science, College of Electronics and Information Engineering, Tongji University, Shanghai, China.
² School of Computer Science, Fudan University, Shanghai, China.
³ Department of Computer Science, Technical University of Munich, Munich, Germany.

PMID: 31316366
PMCID: PMC6611356
DOI: 10.3389/fnbot.2019.00040

Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Jieneng Chen et al. Front Neurorobot. 2019.

. 2019 Jun 28:13:40.

doi: 10.3389/fnbot.2019.00040. eCollection 2019.

Authors

Jieneng Chen¹, Jingye Chen², Ruiming Zhang¹, Xiaobin Hu³

Affiliations

¹ Department of Computer Science, College of Electronics and Information Engineering, Tongji University, Shanghai, China.
² School of Computer Science, Fudan University, Shanghai, China.
³ Department of Computer Science, Technical University of Munich, Munich, Germany.

PMID: 31316366
PMCID: PMC6611356
DOI: 10.3389/fnbot.2019.00040

Abstract

An effective way to achieve intelligence is to simulate various intelligent behaviors in the human brain. In recent years, bio-inspired learning methods have emerged, and they are different from the classical mathematical programming principle. From the perspective of brain inspiration, reinforcement learning has gained additional interest in solving decision-making tasks as increasing neuroscientific research demonstrates that significant links exist between reinforcement learning and specific neural substrates. Because of the tremendous research that focuses on human brains and reinforcement learning, scientists have investigated how robots can autonomously tackle complex tasks in the form of making a self-driving agent control in a human-like way. In this study, we propose an end-to-end architecture using novel deep-Q-network architecture in conjunction with a recurrence to resolve the problem in the field of simulated self-driving. The main contribution of this study is that we trained the driving agent using a brain-inspired trial-and-error technique, which was in line with the real world situation. Besides, there are three innovations in the proposed learning network: raw screen outputs are the only information which the driving agent can rely on, a weighted layer that enhances the differences of the lengthy episode, and a modified replay mechanism that overcomes the problem of sparsity and accelerates learning. The proposed network was trained and tested under a third-party OpenAI Gym environment. After training for several episodes, the resulting driving agent performed advanced behaviors in the given scene. We hope that in the future, the proposed brain-inspired learning system would inspire practicable self-driving control solutions.

Keywords: brain-inspired learning; end-to-end architecture; recurrence; reinforcement learning; self-driving agent.

PubMed Disclaimer

Figures

**Figure 1**
The modified DRQN model. The value function was divided into two categories: the current value function Q and target value function Q′. The parameters in Q were assigned to Q′ per N episodes. The state contained two elements: o_t gained from the current environment and h_t−1 gained from former information. The agent performed action a using a specific policy, and the sequence (o_t, *a, r, o*_t′) was stored in the replay memory unit. We used a prioritized experience replay memory unit here. During training, the sequence was randomly chosen from the replay memory unit. We trained the network using gradient descent to make the current value function Q approach Q′ given a specific sequence. The loss function was shown in Equation (4).

**Figure 2**
Sequence updates in the recurrent network. Only the scores of the actions taken in states 5, 6, and 7 will be updated. The first four states provide a more accurate hidden state to the LSTM, while the last state provides a target for state 7.

**Figure 3**
An illustration of the architecture of our model. The input image is assigned to four convolution layers. The output of the convolution layers is split into two streams. The first stream (bottom) flattens the output and feeds it to an LSTM. The second one (top) flattens the output and feeds it to a fully connected layer. Then, we obtained an importance stream and value stream individually and multiplied them as the output. We stored the information in prioritized experience memory unit. As was shown in Figure 1, the network was trained using the DRQN loss function.

**Figure 4**
There are reward trends for two different maps. **(A)** corresponded to Farm and **(B)** corresponded to Raceway. After training for 1,750 episodes, we obtained the reward tendency. At about 400 episodes, the stability of the driving agent began to increase. After 1,400 episodes, the reward stabilized at a high level.

**Figure 5**
We visualized the first layer and obtain a direct view (left). However, many grids contain unreadable information such as the grid marked with a green frame. Because the first layer's output was quite abstract, we then visualized the last layer using deconvolution and obtained the right picture. It seemed to represent the wall element in the original graph. That means the agent could pay attention to the wall and then take suitable actions. Panel **(A)** shows the direct view visualization of the first layer of CNN. Panel **(B)** shows the visualization of the last layer of CNN using deconvolution.

See this image and copyright information in PMC

References

1. Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI
1. Botvinick M., Ritter S., Wang J., Hassabis D. (2019). Reinforcement learning, fast and slow. Trends Cogn. Sci. 23, 408–422. 10.1016/j.tics.2019.02.006 - DOI - PubMed
1. Dolan R. J., Dayan P. (2013). Goals and habits in the brain. Neuron 80, 312–325. 10.1016/j.neuron.2013.09.007 - DOI - PMC - PubMed
1. Foerster J. N., Assael Y. M., de Freitas N., Whiteson S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks, in Advances in Neural Information Processing Systems (NeurIPS) (Barcelona: Curran Associates, Inc; ), 2137–2145.
1. Gershman S. J., Daw N. D. (2017). Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128. 10.1146/annurev-psych-122414-033625 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Affiliations

Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources