31

Spinning Up in Deep RL

 5 years ago
source link: https://www.tuicool.com/articles/hit/3mquaqa
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

We’re releasing Spinning Up in Deep RL, an educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning. Spinning Up consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials.

Take your first steps in Deep RL

At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. While there are numerous resources available to let people quickly ramp up in deep learning, deep reinforcement learning is more challenging to break into. We've designed Spinning Up to help people learn to use these technologies and to develop intuitions about them.

We were inspired to build Spinning Up through our work with the OpenAIScholars andFellows initiatives, where we observed that it's possible for people with little-to-no experience in machine learning to rapidly ramp up as practitioners, if the right guidance and resources are available to them. Spinning Up in Deep RL was built with this need in mind and is integrated into the curriculum for2019 cohorts of Scholars and Fellows.

We've also seen that being competent in RL can help people participate in interdisciplinary research areas likeAI safety, which involve a mix of reinforcement learning and other skills. We've had so many people ask for guidance in learning RL from scratch, that we've decided to formalize the informal advice we've been giving.

spinning-up-in-rl-1.png

Spinning Up in Deep RL consists of the following core components:

  • A shortintroduction to RL terminology, kinds of algorithms, and basic theory.
  • Anessay about how to grow into an RL research role.
  • A curated list ofimportant papers organized by topic.
  • A well-documented code repo of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC).
  • And afew exercises to serve as warm-ups.

Support

We have the following support plan for this project:

  • High-bandwidth software support period : For the first three weeks following release we'll move quickly on bug-fixes, installation issues, and resolving errors or ambiguities in the docs. We’ll work hard to streamline the user experience, in order to make it as easy as possible to self-study with Spinning Up.
  • Major review in April, 2019 : Approximately six months after release, we’ll do a serious review of the state of the package based on feedback we receive from the community, and announce any plans for future modification.
  • Public release of internal development : If we make changes to Spinning Up in Deep RL as we work with our Scholars and Fellows, we’ll push the changes to the public repo and make them immediately available to everyone.

Education at OpenAI

Spinning Up in Deep RL is part of a new education initiative at OpenAI which we’re ‘spinning up’ to ensure we fulfill one of the tenets of theOpenAI Charter: "seek to create a global community working together to address AGI’s global challenges”. We hope Spinning Up will allow more people to become familiar with deep reinforcement learning, and use it to help advance safe and broadly beneficial AI.

education-spinning-up.png

We're going to host a workshop on Spinning Up in Deep RL at OpenAI San Francisco on February 2nd 2019. The workshop will consist of 3 hours of lecture material and 5 hours of semi-structured hacking, project-development, and breakout sessions - all supported by members of the technical staff at OpenAI. Ideal attendees have software engineering experience and have tinkered with ML but no formal ML experience is required. If you're interested in participating please complete our short application here .

If you want to help us push the limits of AI while communicating with and educating others, then consider applying towork at OpenAI.

Partnerships

We’re also going to work with other organizations to help us educate people using these materials. For our first partnership, we’re working with the Center for Human-Compatible AI (CHAI) at the University of California at Berkeley to run a workshop on deep RL in early 2019, similar to the planned Spinning Up workshop at OpenAI. We hope this will be the first of many.

chai-logo-1.png

Hello World

The best way to get a feel for how deep RL algorithms perform is to just run them. With Spinning Up, that’s as easy as:

python -m spinup.run ppo --env CartPole-v1 --exp_name hello_world

At the end of training, you’ll get instructions on how to view data from the experiments and watch videos of your trained agent.

Spinning Up implementations are compatible with Gym environments from theClassic Control,Box2D, orMuJoCo task suites.

We’ve designed the code for Spinning Up with newcomers in mind, making it short, friendly, and as easy to learn from as possible. Our goal was to write minimal implementations to demonstrate how the theory becomes code, avoiding the layers of abstraction and obfuscation typically present in deep RL libraries. We favor clarity over modularity --- code reuse between implementations is strictly limited to logging and parallelization utilities. Code is annotated so that you always know what’s going on, and is supported by background material (and pseudocode) on the corresponding readthedocs page.

Acknowledgements

Thanks to the many people who contributed to this launch: Alex Ray, Amanda Askell, Ashley Pilipiszyn, Ben Garfinkel, Catherine Olsson, Christy Dennison, Coline Devin, Daniel Zeigler, Dylan Hadfield-Menell, Ge Yang, Greg Khan, Jack Clark, Jonas Rothfuss, Larissa Schiavo, Leandro Castelao, Lilian Weng, Maddie Hall, Matthias Plappert, Miles Brundage, Peter Zokhov, and Pieter Abbeel.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK