Introduction to TF-Agents: A library for Reinforcement Learning in TensorFlow
source link: https://www.tuicool.com/articles/niQrEjz
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction to TF-Agents : A library for Reinforcement Learning in TensorFlow
Oct 4 ·6min read
Train your own AI bot via a flexible and powerful reinforcement learning library in Tensorflow
Example of an untrained agent playing the CartPole game
The topic for today is on Tensorflow’s latest reinforcement learning library called TF-Agents. This library is fairly new and just open-sourced to the world about a year ago. As a result, it seriously lacks proper documentations and tutorials compared to the rest of the popular reinforcement learning libraries. In this tutorial, we are going to learn the proper way to setup and run the tutorials provided by the official documentation. The content is categorized into the following:
- Installation
- Examples
- Conclusions
Without further ado, let’s get started!
1. Installation
I am using Ubuntu 18.04.2 LTS for this project but the following steps should works for any other operating systems. Modify the command accordingly based on what you are using.
Virtual Environment
First and foremost, we will need a new virtual environment for this project. It is a good idea to separate the projects into its own virtual environment. Open up a terminal and go to the directory of your choice. Then, run the following code:
python3 -m venv tfagent
A new folder called tfagent will be created in the same directory. Activate it by running the following:
source tfagent/bin/activate
You should see the following output
Pip Upgrade (optional)
It is always a good idea to update pip module to the latest version. Run the following command if you are unsure whether you have the latest version
python3 -m pip install --upgrade pip
Jupyter Notebook
Next, we will be installing Jupyter Notebook, a web-based interactive development environment for a smoother experience. Strongly recommend to those involved in data science tasks. Run the following command:
python3 -m pip install jupyter
TF-nightly
If you have been following the tutorial provided by the official site, you will notice that it will have the following code:
... import tensorflow as tf from tf_agents.agents.reinforce import reinforce_agent from tf_agents.drivers import dynamic_step_driver from tf_agents.environments import suite_gym from tf_agents.environments import tf_py_environment from tf_agents.eval import metric_utils from tf_agents.metrics import tf_metrics from tf_agents.networks import actor_distribution_network from tf_agents.replay_buffers import tf_uniform_replay_buffer from tf_agents.trajectories import trajectory from tf_agents.utils import common ...
We are not going to install tensorflow but the preview build for CPU-only called tf-nightly. This module is fairly unstable but it is needed for tf-agent. The official site recommend us to install the latest version via the following code:
python3 -m pip install --upgrade tf-nightly
It will install both tf-nightly and also tf-nightly-estimator. Both modules might have different versions.
TFP-nightly
Apart from that, we will also need to install tfp-nightly, a library for probabilistic reasoning and statistical analysis in TensorFlow. Run the following:
python3 -m pip install tfp-nightly
TF-agents-nightly
We are now ready for the tf agent module. Make sure that you have installed both tf-nightly and tfp-nightly before you proceed. Type the following command in terminal and run it:
python3 -m pip install tf-agents-nightly
The modules version that I am using are as follow:
Downgrade gast
If you encountered any issue with certain modules such as the following, you need to modify the version of the modules:
AttributeError: module 'gast' has no attribute 'Ellipsis'
For gast, we need to set it to the following version:
python3 -m pip install gast==0.2.2
Other python modules
The official tutorials requires the following modules to work properly. Execute the following command one by one to install them:
python3 -m pip install gym==0.10.11 python3 -m install imageio==2.4.0 python3 -m install pyglet==1.3.2 python3 -m install pyvirtualdisplay python3 -m install matplotlib
2. Examples
We will be using the initial DQN tutorial as example. The code and steps are pretty straightforward. You need to start jupyter notebook via the following command:
jupyter notebook
It will open up a tab in your default browser with the following url
http://localhost:8892/tree
Create a new Python3 file. Then, head over to the DQN tutorial and start to copy and paste the code inside the notebook. You can run each of the cell independently. Let’s go through what will you see if everything works properly.
Render Environment
The following code will render the environment and output you a image showing how the game looks like.
env.reset() PIL.Image.fromarray(env.render())
The result is as follow:
Specification
The time_step_spec () method returns the specification for the TimeStep tuple.
print('Observation Spec:') print(env.time_step_spec().observation) print('Reward Spec:') print(env.time_step_spec().reward) print('Action Spec:') print(env.action_spec())
You should get the following output
A. Observation is an array of 4 floats:
- the position and velocity of the cart
- the angular position and velocity of the pole
B. Reward is a scalar float value
C. Action is a scalar integer with only two possible values:
- 0 — "move left"
- 1 — "move right"
Training
Everything should works fine until you reached the “Training the Agent” part. Trying to run the code in Jupyter Notebook’s cell will encounter an error.
"UsageError: Line magic function
%%time
not found."
This is simply because magic function should start from the first line by convention. Simply remove the command and leave %%time as the first line in the cell. You should be able to see the following output once you ran it.
It will take some time for it to run. Estimated time is around 5 minutes. The final output is as follow
Visualization
You can make sure of matplotlib module to visualize the training result. One iteration consists of 200 time steps. The player is rewarded with 1 reward each step the pole remains standing up. As a result, the maximum return for one episode is 200. Let’s run the following code:
iterations = range(0, num_iterations + 1, eval_interval) plt.plot(iterations, returns) plt.ylabel('Average Return') plt.xlabel('Iterations') plt.ylim(top=250)
Your chart should looks like this (there might be some small differences):
Video
The final part is to render a video to show the agent playing the game. You should be able to see the trained version as follow:
Noticed that it is playing extremely well by balancing the pole in a standing still position. The untrained agent will looks like this
3. Conclusion
Congratulations! You have successfully trained an agent that is capable of playing cartpole. Let’s recap on what we have learned today. First and foremost, we started off with setting up a virtual environment. Then, we installed the necessary python modules. Along the way, we also required to downgrade the gast module to rectify some errors.
Next, we followed the tutorial provided to render the image and set the required parameters. After that, we trained an agent with the DQN learning algorithm. It took about 5 minutes for the training.
Finally, we attempt to visualize it using the matplotlib and render two videos showing the differences between a trained agent and an untrained agent.
Thanks a lot and have a great day ahead!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK