Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 28:13:40.
doi: 10.3389/fnbot.2019.00040. eCollection 2019.

Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Affiliations

Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent

Jieneng Chen et al. Front Neurorobot. .

Abstract

An effective way to achieve intelligence is to simulate various intelligent behaviors in the human brain. In recent years, bio-inspired learning methods have emerged, and they are different from the classical mathematical programming principle. From the perspective of brain inspiration, reinforcement learning has gained additional interest in solving decision-making tasks as increasing neuroscientific research demonstrates that significant links exist between reinforcement learning and specific neural substrates. Because of the tremendous research that focuses on human brains and reinforcement learning, scientists have investigated how robots can autonomously tackle complex tasks in the form of making a self-driving agent control in a human-like way. In this study, we propose an end-to-end architecture using novel deep-Q-network architecture in conjunction with a recurrence to resolve the problem in the field of simulated self-driving. The main contribution of this study is that we trained the driving agent using a brain-inspired trial-and-error technique, which was in line with the real world situation. Besides, there are three innovations in the proposed learning network: raw screen outputs are the only information which the driving agent can rely on, a weighted layer that enhances the differences of the lengthy episode, and a modified replay mechanism that overcomes the problem of sparsity and accelerates learning. The proposed network was trained and tested under a third-party OpenAI Gym environment. After training for several episodes, the resulting driving agent performed advanced behaviors in the given scene. We hope that in the future, the proposed brain-inspired learning system would inspire practicable self-driving control solutions.

Keywords: brain-inspired learning; end-to-end architecture; recurrence; reinforcement learning; self-driving agent.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The modified DRQN model. The value function was divided into two categories: the current value function Q and target value function Q′. The parameters in Q were assigned to Q′ per N episodes. The state contained two elements: ot gained from the current environment and ht−1 gained from former information. The agent performed action a using a specific policy, and the sequence (ot, a, r, ot′) was stored in the replay memory unit. We used a prioritized experience replay memory unit here. During training, the sequence was randomly chosen from the replay memory unit. We trained the network using gradient descent to make the current value function Q approach Q′ given a specific sequence. The loss function was shown in Equation (4).
Figure 2
Figure 2
Sequence updates in the recurrent network. Only the scores of the actions taken in states 5, 6, and 7 will be updated. The first four states provide a more accurate hidden state to the LSTM, while the last state provides a target for state 7.
Figure 3
Figure 3
An illustration of the architecture of our model. The input image is assigned to four convolution layers. The output of the convolution layers is split into two streams. The first stream (bottom) flattens the output and feeds it to an LSTM. The second one (top) flattens the output and feeds it to a fully connected layer. Then, we obtained an importance stream and value stream individually and multiplied them as the output. We stored the information in prioritized experience memory unit. As was shown in Figure 1, the network was trained using the DRQN loss function.
Figure 4
Figure 4
There are reward trends for two different maps. (A) corresponded to Farm and (B) corresponded to Raceway. After training for 1,750 episodes, we obtained the reward tendency. At about 400 episodes, the stability of the driving agent began to increase. After 1,400 episodes, the reward stabilized at a high level.
Figure 5
Figure 5
We visualized the first layer and obtain a direct view (left). However, many grids contain unreadable information such as the grid marked with a green frame. Because the first layer's output was quite abstract, we then visualized the last layer using deconvolution and obtained the right picture. It seemed to represent the wall element in the original graph. That means the agent could pay attention to the wall and then take suitable actions. Panel (A) shows the direct view visualization of the first layer of CNN. Panel (B) shows the visualization of the last layer of CNN using deconvolution.

References

    1. Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI
    1. Botvinick M., Ritter S., Wang J., Hassabis D. (2019). Reinforcement learning, fast and slow. Trends Cogn. Sci. 23, 408–422. 10.1016/j.tics.2019.02.006 - DOI - PubMed
    1. Dolan R. J., Dayan P. (2013). Goals and habits in the brain. Neuron 80, 312–325. 10.1016/j.neuron.2013.09.007 - DOI - PMC - PubMed
    1. Foerster J. N., Assael Y. M., de Freitas N., Whiteson S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks, in Advances in Neural Information Processing Systems (NeurIPS) (Barcelona: Curran Associates, Inc; ), 2137–2145.
    1. Gershman S. J., Daw N. D. (2017). Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128. 10.1146/annurev-psych-122414-033625 - DOI - PMC - PubMed

LinkOut - more resources