

GitHub - ikostrikov/pytorch-a2c-ppo-acktr: PyTorch implementation of Advantage A...
source link: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

pytorch-a2c-ppo-acktr
Update (April 12th, 2021)
PPO is great, but Soft Actor Critic can be better for many continuous control tasks. Please check out my new RL repository in jax.
Please use hyper parameters from this readme. With other hyper parameters things might not work (it's RL after all)!
This is a PyTorch implementation of
- Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
- Proximal Policy Optimization PPO
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
- Generative Adversarial Imitation Learning GAIL
Also see the OpenAI posts: A2C/ACKTR and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Please use this bibtex if you want to cite this repository in your publications:
@misc{pytorchrl,
author = {Kostrikov, Ilya},
title = {PyTorch Implementations of Reinforcement Learning Algorithms},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail}},
}
Supported (and tested) environments (via OpenAI Gym)
I highly recommend PyBullet as a free open source alternative to MuJoCo for continuous control tasks.
All environments are operated using exactly the same Gym interface. See their documentations for a comprehensive list.
To use the DeepMind Control Suite environments, set the flag --env-name dm.<domain_name>.<task_name>
, where domain_name
and task_name
are the name of a domain (e.g. hopper
) and a task within that domain (e.g. stand
) from the DeepMind Control Suite. Refer to their repo and their tech report for a full list of available domains and tasks. Other than setting the task, the API for interacting with the environment is exactly the same as for all the Gym environments thanks to dm_control2gym.
Requirements
- Python 3 (it might work with Python 2, but I didn't test it)
- PyTorch
- Stable baselines3
In order to install requirements, follow:
# PyTorch
conda install pytorch torchvision -c soumith
# Other requirements
pip install -r requirements.txt
# Gym Atari
conda install -c conda-forge gym-atari
Contributions
Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first. Also see a todo list below.
Also I'm searching for volunteers to run all experiments on Atari and MuJoCo (with multiple random seeds).
Disclaimer
It's extremely difficult to reproduce results for Reinforcement Learning methods. See "Deep Reinforcement Learning that Matters" for more information. I tried to reproduce OpenAI results as closely as possible. However, majors differences in performance can be caused even by minor differences in TensorFlow and PyTorch libraries.
- Improve this README file. Rearrange images.
- Improve performance of KFAC, see kfac.py for more information
- Run evaluation for all games and algorithms
Visualization
In order to visualize the results use visualize.ipynb
.
Training
Atari
python main.py --env-name "PongNoFrameskip-v4"
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01
ACKTR
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
MuJoCo
Please always try to use --use-proper-time-limits
flag. It properly handles partial trajectories (see https://github.com/sfujim/TD3/blob/master/main.py#L123).
python main.py --env-name "Reacher-v2" --num-env-steps 1000000
python main.py --env-name "Reacher-v2" --algo ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --gae-lambda 0.95 --num-env-steps 1000000 --use-linear-lr-decay --use-proper-time-limits
ACKTR
ACKTR requires some modifications to be made specifically for MuJoCo. But at the moment, I want to keep this code as unified as possible. Thus, I'm going for better ways to integrate it into the codebase.
Enjoy
Atari
python enjoy.py --load-dir trained_models/a2c --env-name "PongNoFrameskip-v4"
MuJoCo
python enjoy.py --load-dir trained_models/ppo --env-name "Reacher-v2"
Results
ACKTR
Recommend
-
271
RetinaNet An implementation of RetinaNet in PyTorch.
-
180
PyTorch implementation of the YOLO (You Only Look Once) v2 The YOLOv2 is one of the most popular one-stage o...
-
131
README.md Max-Pooling Loss Loss Max-Pooling for Semantic Image Segmentation Installation Requirements To...
-
60
README.md pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertibl...
-
106
README.md PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook)
-
67
README.md BERT-pytorch
-
148
README.md YOLOv3 in Pytorch Pytorch implementation of YOLOv3
-
106
README.md Super-SloMo
-
52
README.md 3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input...
-
7
Deep Q-learning Network(续) https://zhuanlan.zhihu.com/p/21262246 DQN从入门到放弃1 DQN与增强学习 https://zhuanlan.zhihu.com/p/21292697 DQN从入门到放弃2 增强学习与MDP https://zhuanlan.zhihu.com/p/2...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK