

Ray and RLlib for Fast and Parallel Reinforcement Learning
source link: https://towardsdatascience.com/ray-and-rllib-for-fast-and-parallel-reinforcement-learning-6d31ee21c96c?gi=90f2b1ae232b
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

An intro tutorial to RL training with Ray
Apr 8 ·5min read
Ray is more than just a library for multi-processing; Ray’s real power comes from the RLlib and Tune libraries that leverage this capability for reinforcement learning. It enables you to scale training to large-scaled distributed servers, or just take advantage of the parallelization properties to more efficiently train using your own laptop. The choice is yours.
TL;DR
We show how to train a custom reinforcement learning environment that has been built on top of OpenAI Gym using Ray and RLlib.
A Gentle RLlib Tutorial
Once you’ve installed Ray and RLlib with pip install ray[rllib]
, you can train your first RL agent with a single command in the command line:
rllib train --run=A2C --env=CartPole-v0
This will tell your computer to train using the Advantage Actor Critic Algorithm (A2C) using the CartPole
environment. A2C and a host of other algorithms are already built into the library meaning you don’t have to worry about the details of implementing those yourself.
This is really great, particularly if you’re looking to train using a standard environment and algorithm. If you want to do more, however, you’re going to have to dig a bit deeper.
RLlib Agents
The various algorithms you can access are available through ray.rllib.agents
. Here, you can find a long list of different implementations in both PyTorch and Tensorflow to begin playing with.
These are all accessed using the algorithm’s trainer method. For example, if you want to use A2C as shown above, you can run:
import ray from ray.rllib import agentsray.init() trainer = agents.a3c.A2CTrainer(env='CartPole-v0')
If you want to try a DQN instead, you can call:
trainer = agents.dqn.DQNTrainer(env='CartPole-v0') # Deep Q Network
All the algorithms follow the same basic construction alternating from lower case algo abbreviation to uppercase algo abbreviation followed by “Trainer.”
Changing hyperparameters is as easy as passing a dictionary of configurations to the config
argument. A quick way to see what’s available to you is to call trainer.config
to print out the options that are available for your chosen algorithm. A few examples include:
-
fcnet_hiddens
controls the number of hidden units and hidden layers (passed as a dictionary calledmodel
intoconfig
and then a list, I’ll show an example below). -
vf_share_layers
determines whether or not you have one neural network with multiple output heads or separate value and policy networks. -
num_workers
sets the number of processors for parallelization. -
num_gpus
to set the number of GPU’s you will use.
There are lots of others to set and customize from the network (typically located in model
dictionary) to various callbacks and multi-agent settings.
Example: Training PPO for CartPole
I want to turn and show a quick example to get you started and show you how this works with a standard, OpenAI Gym environment.
Choose your IDE or text editor of choice and try the following:
import ray from ray.rllib import agents ray.init() # Skip or set to ignore if already called config = {'gamma': 0.9, 'lr': 1e-2, 'num_workers': 4, 'train_batch_size': 1000, 'model': { 'fcnet_hiddens': [128, 128] }} trainer = agents.ppo.PPOTrainer(env='CartPole-v0', config=config) results = trainer.train()
The config
dictionary changed the defaults for the values above. You can see how we can influence the number of layers and nodes in the network by nesting a dictionary called model
in the config
dictionary. Once we've specified our configuration, calling the train()
method on our trainer
object will send the environment to the workers and begin collecting data. Once enough data is collected (1,000 samples according to our settings above) the model will update and send the output to a new dictionary called results
.
If you want to run multiple updates, you can set up a training loop to continuously call the train()
method for a given number of iterations or until some other threshold has been reached.
Customizing your RL Environment
OpenAI Gym and all of its extensions are great, but if you’re looking for novel applications of RL or to use it in your company , you’re going to need to work with a custom environment.
Unfortunately, the current version of Ray (0.9) explicitly states that it is not compatible with the gym registry. Thankfully, it isn’t too difficult to put together a helper function to get custom gym environments to work with Ray.
Let’s assume you have some environment called MyEnv-v0
that is properly registered so that you can invoke it with gym.make('MyEnv-v0')
like you would with any other gym environment (if you haven't already, you can check out my step-by-step process on setting up environments here ).
To call that custom environment from Ray, you need to wrap it in a function that will return the environment class, not an instantiated object. The best way I’ve found to do this is with a create_env()
helper function:
def env_creator(env_name): if env_name == 'MyEnv-v0': from custom_gym.envs.custom_env import CustomEnv0 as env elif env_name == 'MyEnv-v1': from custom_gym.envs.custom_env import CustomEnv1 as env else: raise NotImplementedError return env
From here, you can set up your agent and train it on this new environment with only a slight modification to the trainer
.
env_name = 'MyEnv-v0' config = { # Whatever config settings you'd like... } trainer = agents.ppo.PPOTrainer( env=env_creator(env_name), config=config) max_training_episodes = 10000 while True: results = trainer.train() # Enter whatever stopping criterion you like if results['episodes_total'] >= max_training_episodes: break print('Mean Rewards:\t{:.1f}'.format(results['episode_reward_mean']))
Note that above, we call the environment with the env_creator
, everything else remains the same.
Tips for Working with Custom Environments
If you’re used to building your own models from the environment to the networks and algorithms, then there are some features you need to be cognizant of when working with Ray.
First, Ray adheres to the OpenAI Gym API meaning that your environments need to have step()
and reset()
methods as well as carefully specified observation_space
and action_space
attributes. I had always been a bit lazy with respect to these last two, because I could simply define my network input and output dimensions and not have to regard the range of input values, for example, that the gym.spaces
methods require. Ray checks all the inputs to ensure that they fall within that specified range (I spent too much time debugging runs before realizing that the low
value on my gym.spaces.Box
was set to 0, but the environment was returning values on the order of -1e-17 and causing it to crash).
When setting up your action and observation spaces, stick to Box
, Discrete
, and Tuple
. The MultiDiscrete
and MultiBinary
don't work ( currently ) and will cause the run to crash. Instead, wrap Box
or Discrete
spaces in the Tuple
function.
Take advantage of custom pre-processing when you can. Ray makes assumptions about your state inputs, which usually work just fine, but it also enables you to customize the pre-processing steps which may help your training.
Going Beyond RLlib
Ray can greatly speed up training and make it far easier to get started with deep reinforcement learning. RLlib isn’t the end (we just scratched the surface of its capabilities here anyway), it has a powerful cousin called Tune which enables you to adjust the hyperparameters of your model and manages all of the important data collection and back-end work for you. Make sure you check back for updates on how to bring this library into your work process.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK