Introduction to TF-Agents: A library for Reinforcement Learning in TensorFlow

Introduction to TF-Agents : A library for Reinforcement Learning in TensorFlow

Oct 4 ·6min read

Train your own AI bot via a flexible and powerful reinforcement learning library in Tensorflow

Example of an untrained agent playing the CartPole game

The topic for today is on Tensorflow’s latest reinforcement learning library called TF-Agents. This library is fairly new and just open-sourced to the world about a year ago. As a result, it seriously lacks proper documentations and tutorials compared to the rest of the popular reinforcement learning libraries. In this tutorial, we are going to learn the proper way to setup and run the tutorials provided by the official documentation. The content is categorized into the following:

Installation
Examples
Conclusions

Without further ado, let’s get started!

1. Installation

I am using Ubuntu 18.04.2 LTS for this project but the following steps should works for any other operating systems. Modify the command accordingly based on what you are using.

Virtual Environment

First and foremost, we will need a new virtual environment for this project. It is a good idea to separate the projects into its own virtual environment. Open up a terminal and go to the directory of your choice. Then, run the following code:

python3 -m venv tfagent

A new folder called tfagent will be created in the same directory. Activate it by running the following:

source tfagent/bin/activate

You should see the following output

Pip Upgrade (optional)

It is always a good idea to update pip module to the latest version. Run the following command if you are unsure whether you have the latest version

python3 -m pip install --upgrade pip

Jupyter Notebook

Next, we will be installing Jupyter Notebook, a web-based interactive development environment for a smoother experience. Strongly recommend to those involved in data science tasks. Run the following command:

python3 -m pip install jupyter

TF-nightly

If you have been following the tutorial provided by the official site, you will notice that it will have the following code:

...
import tensorflow as tf

from tf_agents.agents.reinforce import reinforce_agent
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.networks import actor_distribution_network
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.trajectories import trajectory
from tf_agents.utils import common
...

We are not going to install tensorflow but the preview build for CPU-only called tf-nightly. This module is fairly unstable but it is needed for tf-agent. The official site recommend us to install the latest version via the following code:

python3 -m pip install --upgrade tf-nightly

It will install both tf-nightly and also tf-nightly-estimator. Both modules might have different versions.

TFP-nightly

Apart from that, we will also need to install tfp-nightly, a library for probabilistic reasoning and statistical analysis in TensorFlow. Run the following:

python3 -m pip install tfp-nightly

TF-agents-nightly

We are now ready for the tf agent module. Make sure that you have installed both tf-nightly and tfp-nightly before you proceed. Type the following command in terminal and run it:

python3 -m pip install tf-agents-nightly

The modules version that I am using are as follow:

Downgrade gast

If you encountered any issue with certain modules such as the following, you need to modify the version of the modules:

AttributeError: module 'gast' has no attribute 'Ellipsis'

For gast, we need to set it to the following version:

python3 -m pip install gast==0.2.2

Other python modules

The official tutorials requires the following modules to work properly. Execute the following command one by one to install them:

python3 -m pip install gym==0.10.11
python3 -m install imageio==2.4.0
python3 -m install pyglet==1.3.2
python3 -m install pyvirtualdisplay
python3 -m install matplotlib

2. Examples

We will be using the initial DQN tutorial as example. The code and steps are pretty straightforward. You need to start jupyter notebook via the following command:

jupyter notebook

It will open up a tab in your default browser with the following url

http://localhost:8892/tree

Create a new Python3 file. Then, head over to the DQN tutorial and start to copy and paste the code inside the notebook. You can run each of the cell independently. Let’s go through what will you see if everything works properly.

Render Environment

The following code will render the environment and output you a image showing how the game looks like.

env.reset()
PIL.Image.fromarray(env.render())

The result is as follow:

i6N7zmq.png!web

Specification

The time_step_spec () method returns the specification for the TimeStep tuple.

print('Observation Spec:')
print(env.time_step_spec().observation)
print('Reward Spec:')
print(env.time_step_spec().reward)
print('Action Spec:')
print(env.action_spec())

You should get the following output

A. Observation is an array of 4 floats:

the position and velocity of the cart
the angular position and velocity of the pole

B. Reward is a scalar float value

C. Action is a scalar integer with only two possible values:

0 — "move left"
1 — "move right"

Training

Everything should works fine until you reached the “Training the Agent” part. Trying to run the code in Jupyter Notebook’s cell will encounter an error.

"UsageError: Line magic function 
 
    %%time
 
     not found."

This is simply because magic function should start from the first line by convention. Simply remove the command and leave %%time as the first line in the cell. You should be able to see the following output once you ran it.

iYrUru7.png!web

It will take some time for it to run. Estimated time is around 5 minutes. The final output is as follow

Visualization

You can make sure of matplotlib module to visualize the training result. One iteration consists of 200 time steps. The player is rewarded with 1 reward each step the pole remains standing up. As a result, the maximum return for one episode is 200. Let’s run the following code:

iterations = range(0, num_iterations + 1, eval_interval)
plt.plot(iterations, returns)
plt.ylabel('Average Return')
plt.xlabel('Iterations')
plt.ylim(top=250)

Your chart should looks like this (there might be some small differences):

IBRB3qy.png!web

Video

The final part is to render a video to show the agent playing the game. You should be able to see the trained version as follow:

Noticed that it is playing extremely well by balancing the pole in a standing still position. The untrained agent will looks like this

3. Conclusion

Congratulations! You have successfully trained an agent that is capable of playing cartpole. Let’s recap on what we have learned today. First and foremost, we started off with setting up a virtual environment. Then, we installed the necessary python modules. Along the way, we also required to downgrade the gast module to rectify some errors.

Next, we followed the tutorial provided to render the image and set the required parameters. After that, we trained an agent with the DQN learning algorithm. It took about 5 minutes for the training.

Finally, we attempt to visualize it using the matplotlib and render two videos showing the differences between a trained agent and an untrained agent.

Thanks a lot and have a great day ahead!