Extending OpenAI Gym environments with Wrappers and Monitors [Tutorial]

In this article we are going to discuss two OpenAI Gym functionalities; Wrappers and Monitors. These functionalities are present in OpenAI to make your life easier and your codes cleaner. It provides you these convenient frameworks to extend the functionality of your existing environment in a modular way and get familiar with an agent's activity. So, let's take a quick overview of these classes.

This article is an extract taken from the book, Deep Reinforcement Learning Hands-On, Second Edition written by, Maxim Lapan.

What are Wrappers?

Very frequently, you will want to extend the environment's functionality in some generic way. For example, an environment gives you some observations, but you want to accumulate them in some buffer and provide to the agent the N last observations, which is a common scenario for dynamic computer games, when one single frame is just not enough to get full information about the game state.

Another example is when you want to be able to crop or preprocess an image's pixels to make it more convenient for the agent to digest, or if you want to normalize reward scores somehow. There are many such situations which have the same structure: you'd like to “wrap” the existing environment and add some extra logic doing something. Gym provides you with a convenient framework for these situations, called a Wrapper class.

How does a wrapper work?

The class structure is shown on the following diagram.

The Wrapper class inherits the Env class. Its constructor accepts the only argument: the instance of the Env class to be “wrapped”. To add extra functionality, you need to redefine the methods you want to extend like step() or reset(). The only requirement is to call the original method of the superclass.

openai-gym-environments-wrappers-and-monitors-tutorial-img-0

Figure 1: The hierarchy of Wrapper classes in Gym.

To handle more specific requirements, like a Wrapper which wants to process only observations from the environment, or only actions, there are subclasses of Wrapper which allow filtering of only a specific portion of information.

They are:

ObservationWrapper: You need to redefine its observation(obs) method. Argument obs is an observation from the wrapped environment, and this method should return the observation which will be given to the agent.

RewardWrapper: Exposes the method reward(rew), which could modify the reward value given to the agent.

ActionWrapper: You need to override the method action(act) which could tweak the action passed to the wrapped environment to the agent.

Now let’s implement some wrappers

To make it slightly more practical, let's imagine a situation where we want to intervene in the stream of actions sent by the agent and, with a probability of 10%, replace the current action with random one. By issuing the random actions, we make our agent explore the environment and from time to time drift away from the beaten track of its policy. This is an easy thing to do using the ActionWrapper class.

import gym
from typing import TypeVar
import random

Action = TypeVar('Action')
class RandomActionWrapper(gym.ActionWrapper):
    def __init__(self, env, epsilon=0.1):
        super(RandomActionWrapper, self).__init__(env)
        self.epsilon = epsilon

Here we initialize our wrapper by calling a parent's __init__ method and saving epsilon (a probability of a random action).

  def action(self, action):
        if random.random() < self.epsilon:
            print("Random!")
            return self.env.action_space.sample()
        return action

This is a method that we need to override from a parent's class to tweak the agent's actions. Every time we roll the die, with the probability of epsilon, we sample a random action from the action space and return it instead of the action the agent has sent to us. Please note, by using action_space and wrapper abstractions, we were able to write abstract code which will work with any environment from the Gym. Additionally, we print the message every time we replace the action, just to check that our wrapper is working. In production code, of course, this won't be necessary.

if __name__ == "__main__":
   env = RandomActionWrapper(gym.make("CartPole-v0"))

Now it's time to apply our wrapper. We create a normal CartPole environment and pass it to our wrapper constructor. From here on we use our wrapper as a normal Env instance, instead of the original CartPole. As the Wrapper class inherits the Env class and exposes the same interface, we can nest our wrappers in any combination we want. This is a powerful, elegant and generic solution:

  obs = env.reset()
   total_reward = 0.0
   while True:
       obs, reward, done, _ = env.step(0)
       total_reward += reward
       if done:
           break

   print("Reward got: %.2f" % total_reward)

Here is almost the same code, except that every time we issue the same action: 0. Our agent is dull and always does the same thing. By running the code, you should see that the wrapper is indeed working:

rl_book_samples/ch02$ python 03_random_actionwrapper.py
WARN: gym.spaces.Box autodetected dtype as <class 
'numpy.float32'>. Please provide explicit dtype.
Random!
Random!
Random!
Random!
Reward got: 12.00

If you want, you can play with the epsilon parameter on the wrapper's creation and check that randomness improves the agent's score on average. We should move on and look at another interesting gem hidden inside Gym: Monitor.

What is a Monitor?

Another class you should be aware of is Monitor. It is implemented like Wrapper and can write information about your agent's performance in a file with optional video recording of your agent in action. Some time ago, it was possible to upload the result of Monitor class' recording to the https://gym.openai.com website and see your agent's position in comparison to other people's results (see thee following screenshot), but, unfortunately, at the end of August 2017, OpenAI decided to shut down this upload functionality and froze all the results. There are several activities to implement an alternative to the original website, but they are not ready yet. I hope this situation will be resolved soon, but at the time of writing it's not possible to check your result against those of others.

Just to give you an idea of how the Gym web interface looked, here is the CartPole environment leaderboard:

openai-gym-environments-wrappers-and-monitors-tutorial-img-1

Figure 2: OpenAI Gym web interface with CartPole submissions

Every submission in the web interface had details about training dynamics. For example, below is the author's solution for one of Doom's mini-games:

openai-gym-environments-wrappers-and-monitors-tutorial-img-2

Figure 3: Submission dynamics on the DoomDefendLine environment.

Despite this, Monitor is still useful, as you can take a look at your agent's life inside the environment.

How to add Monitor to your agent

So, here is how we add Monitor to our random CartPole agent, which is the only difference (the whole code is in Chapter02/04_cartpole_random_monitor.py).

if __name__ == "__main__":
   env = gym.make("CartPole-v0")
   env = gym.wrappers.Monitor(env, "recording")

The second argument we're passing to Monitor is the name of the directory it will write the results to. This directory shouldn't exist, otherwise your program will fail with an exception (to overcome this, you could either remove the existing directory or pass the force=True argument to Monitor class' constructor).

The Monitor class requires the FFmpeg utility to be present on the system, which is used to convert captured observations into an output video file. This utility must be available, otherwise Monitor will raise an exception. The easiest way to install FFmpeg is by using your system's package manager, which is OS distribution-specific.

To start this example, one of three extra prerequisites should be met:

The code should be run in an X11 session with the OpenGL extension (GLX)

The code should be started in an Xvfb virtual display

You can use X11 forwarding in ssh connection

The cause of this is video recording, which is done by taking screenshots of the window drawn by the environment. Some of the environment uses OpenGL to draw its picture, so the graphical mode with OpenGL needs to be present. This could be a problem for a virtual machine in the cloud, which physically doesn't have a monitor and graphical interface running. To overcome this, there is a special “virtual” graphical display, called Xvfb (X11 virtual framebuffer), which basically starts a virtual graphical display on the server and forces the program to draw inside it. That would be enough to make Monitor happily create the desired videos.

To start your program in the Xvbf environment, you need to have it installed on your machine (it usually requires installing the package xvfb) and run the special script xvfb-run:

$ xvfb-run -s "-screen 0 640x480x24" python 04_cartpole_random_monitor.py
[2017-09-22 12:22:23,446] Making new env: CartPole-v0

[2017-09-22 12:22:23,451] Creating monitor directory recording

[2017-09-22 12:22:23,570] Starting new video recorder writing to 
recording/openaigym.video.0.31179.video000000.mp4

Episode done in 14 steps, total reward 14.00

[2017-09-22 12:22:26,290] Finished writing results. You can upload them
 to the scoreboard via gym.upload('recording')

As you may see from the log above, video has been written successfully, so you can peek inside one of your agent's sections by playing it.

Another way to record your agent's actions is using ssh X11 forwarding, which uses ssh ability to tunnel X11 communications between the X11 client (Python code which wants to display some graphical information) and X11 server (software which knows how to display this information and has access to your physical display). In X11 architecture, the client and the server are separated and can work on different machines. To use this approach, you need the following:

X11 server running on your local machine. Linux comes with X11 server as a standard component (all desktop environments are using X11). On a Windows machine you can set up third-party X11 implementations like open source VcXsrv (available in https://sourceforge.net/projects/vcxsrv/).

The ability to log into your remote machine via ssh, passing –X command line option: ssh –X servername. This enables X11 tunneling and allows all processes started in this session to use your local display for graphics output.

Then you can start a program which uses Monitor class and it will display the agent's actions, capturing the images into a video file.

To summarize, we discussed the two extra functionalities in an OpenAI Gym; Wrappers and Monitors. To solve complex real world problems in Deep Learning, grab this practical guide Deep Reinforcement Learning Hands-On, Second Edition today.

How Reinforcement Learning works

How to implement Reinforcement Learning with TensorFlow

Top 5 tools for reinforcement learning