[box type=”note” align=”” class=”” width=””]This article is an excerpt from the book, Deep Learning Essentials co-authored by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get to grips with the essentials of deep learning by leveraging the power of Python.[/box]
In today’s tutorial, we will implement reinforcement learning with TensorFlow-based Qlearning algorithm.
We will look at a popular game, FrozenLake, which has an inbuilt environment in the OpenAI gym package. The idea behind the FrozenLake game is quite simple. It consists of 4 x 4 grid blocks, where each block can have one of the following four states:
In each of the 16 cells, you can use one of the four actions, namely up/down/left/right, to move to a neighboring state. The goal of the game is to start from state S and end at state G. We will show how we can use a neural network-based Q-learning system to learn a safe path from state S to state G. First, we import the necessary packages and define the game environment:
import gym
import numpy as np
import random
import tensorflow as tf
env = gym.make('FrozenLake-v0')
Once the environment is defined, we can define the network structure that learns the Qvalues. We will use a one-layer neural network with 16 hidden neurons and 4 output neurons as follows:
input_matrix = tf.placeholder(shape=[1,16],dtype=tf.float32)
weight_matrix = tf.Variable(tf.random_uniform([16,4],0,0.01))
Q_matrix = tf.matmul(input_matrix,weight_matrix)
prediction_matrix = tf.argmax(Q_matrix,1)
nextQ = tf.placeholder(shape=[1,4],dtype=tf.float32)
loss = tf.reduce_sum(tf.square(nextQ - Q_matrix))
train = tf.train.GradientDescentOptimizer(learning_rate=0.05)
model = train.minimize(loss)
init_op = tf.global_variables_initializer()
Now we can choose the action greedily:
ip_q = np.zeros(num_states)
ip_q[current_state] = 1
a,allQ = sess.run([prediction_matrix,Q_matrix],feed_dict={input_matrix:
[ip_q]})
if np.random.rand(1) < sample_epsilon:
a[0] = env.action_space.sample()
next_state, reward, done, info = env.step(a[0])
ip_q1 = np.zeros(num_states)
ip_q1[next_state] = 1
Q1 = sess.run(Q_matrix,feed_dict={input_matrix:[ip_q1]})
maxQ1 = np.max(Q1)
targetQ = allQ
targetQ[0,a[0]] = reward + y*maxQ1
_,W1 = sess.run([model,weight_matrix],feed_dict={input_matrix:
[ip_q],nextQ:targetQ})
Figure RL with Q-learning example shows the sample output of the program when executed. You can see different values of Q matrix as the agent moves from one state to the other. You also notice a value of reward 1 when the agent is in state 15:
To summarize, we saw how reinforcement learning can be practically implemented using TensorFlow.
If you found this post useful, do check out the book Deep Learning Essentials which will help you fine-tune and optimize your deep learning models for better performance.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…