2 min read

[box type=”note” align=”” class=”” width=””]This article is an excerpt from the book, Deep Learning Essentials co-authored by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get to grips with the essentials of deep learning by leveraging the power of Python.[/box]

In today’s tutorial, we will implement reinforcement learning with TensorFlow-based Qlearning algorithm.

We will look at a popular game, FrozenLake, which has an inbuilt environment in the OpenAI gym package. The idea behind the FrozenLake game is quite simple. It consists of 4 x 4 grid blocks, where each block can have one of the following four states:

  • S: Starting point/Safe state
  • F: Frozen surface/Safe state
  • H: Hole/Unsafe state
  • G: Goal/Safe or Terminal state

In each of the 16 cells, you can use one of the four actions, namely up/down/left/right, to move to a neighboring state. The goal of the game is to start from state S and end at state G. We will show how we can use a neural network-based Q-learning system to learn a safe path from state S to state G. First, we import the necessary packages and define the game environment:

import gym

import numpy as np

import random

import tensorflow as tf

env = gym.make('FrozenLake-v0')

Once the environment is defined, we can define the network structure that learns the Qvalues. We will use a one-layer neural network with 16 hidden neurons and 4 output neurons as follows:

input_matrix = tf.placeholder(shape=[1,16],dtype=tf.float32)

weight_matrix = tf.Variable(tf.random_uniform([16,4],0,0.01))

Q_matrix = tf.matmul(input_matrix,weight_matrix)

prediction_matrix = tf.argmax(Q_matrix,1)

nextQ = tf.placeholder(shape=[1,4],dtype=tf.float32)

loss = tf.reduce_sum(tf.square(nextQ - Q_matrix))

train = tf.train.GradientDescentOptimizer(learning_rate=0.05)

model = train.minimize(loss)

init_op = tf.global_variables_initializer()

Now we can choose the action greedily:

ip_q = np.zeros(num_states)

ip_q[current_state] = 1

a,allQ = sess.run([prediction_matrix,Q_matrix],feed_dict={input_matrix:


if np.random.rand(1) < sample_epsilon:

a[0] = env.action_space.sample()

next_state, reward, done, info = env.step(a[0])

ip_q1 = np.zeros(num_states)

ip_q1[next_state] = 1

Q1 = sess.run(Q_matrix,feed_dict={input_matrix:[ip_q1]})

maxQ1 = np.max(Q1)

targetQ = allQ

targetQ[0,a[0]] = reward + y*maxQ1

_,W1 = sess.run([model,weight_matrix],feed_dict={input_matrix:


Figure RL with Q-learning example shows the sample output of the program when executed. You can see different values of Q matrix as the agent moves from one state to the other. You also notice a value of reward 1 when the agent is in state 15:

Implement Reinforcement learning with TensorFlow

To summarize, we saw how reinforcement learning can be practically implemented using TensorFlow.

If you found this post useful, do check out the book Deep Learning Essentials which will help you fine-tune and optimize your deep learning models for better performance.

Deep learning essentials



Please enter your comment!
Please enter your name here