GitHub - suragnair/alpha-zero-general: A clean implementation based on AlphaZero...

7 years ago

source link: https://github.com/suragnair/alpha-zero-general
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch, Keras and TensorFlow. An accompanying tutorial can be found here. We also have implementations for GoBang and TicTacToe.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/{pytorch,keras,tensorflow}/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/{pytorch,keras,tensorflow}/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Choose your framework and game in main.py.

Experiments

We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations.

A concise description of our algorithm can be found here.

Contributing

While the current code is fairly functional, we could benefit from the following contributions:

Game logic files for more games that follow the specifications in Game.py, along with their neural networks
Neural networks in other frameworks
Pre-trained models for different game configurations
An asynchronous version of the code- parallel processes for self-play, neural net training and model comparison.
Asynchronous MCTS as described in the paper

Contributors and Credits

Shantanu Thakoor and Megha Jhunjhunwala helped with core design and implementation.
Shantanu Kumar contributed TensorFlow and Keras models for Othello.
Evgeny Tyurin contributed rules and a trained model for TicTacToe.
MBoss contributed rules and a model for GoBang.

Thanks to pytorch-classification and progress.

Recommend

www.solidot.org 7 years ago
Cache

DeepMind 的 AI 从 AlphaGo Zero 进化到 AlphaZero

DeepMind 的 AI 从 AlphaGo Zero 进化到 AlphaZero ...

www.solidot.org 7 years ago
Cache

AlphaZero 是否代表着 AI 领域的一大突破？

146

Github github.com 7 years ago
Cache

GitHub - junxiaosong/AlphaZero_Gomoku: An implementation of the AlphaZero algori...

AlphaZero-Gomoku This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that w...

113

down.51cto.com 7 years ago
Cache

手把手教你搭建AlphaZero（使用Python和Keras）

51CTO学院 - IT人充电，上51CTO学院！请选择你感兴趣的方向选择完毕后，系统会为您推荐合适的精品课程，最多选择5个哟~

zhuanlan.zhihu.com 7 years ago
Cache

AlphaZero实战：从零学下五子棋（附代码）

Github github.com 6 years ago
Cache

icyChessZero：中国象棋 AlphaZero

icyChessZero 中国象棋alpha zero 这个项目受到alpha go zero的启发，旨在训练一个中等人类水平或高于中等人类水平的深度神经网络，来完成下中国象棋的任务。目前这个项目仍在积极开发中，并且仍然没有完成全部的开发，欢迎pul...

www.solidot.org 6 years ago
Cache

AlphaZero 靠自己精通三种棋类游戏

151

Github github.com 6 years ago
Cache

GitHub - hijkzzz/alpha-zero-gomoku: A multi-threaded implementation of AlphaZero

README.md AlphaZero Gomoku A multi-threaded implementation of AlphaZero Features parallel MCTS with virtual loss/libtorch

www.jiqizhixin.com 6 years ago
Cache

「全民体验」AlphaZero：FAIR田渊栋首次开源超级围棋AI

项目：_https://facebook.ai/developers/tools/elf-opengo_ 论文：_https://arxiv.org/abs/1902.04522_ 对于有计算机基础的围棋爱好者来说，你也可以下载 ELF OpenGo 最终版本模型进行...

Github github.com 4 years ago
Cache

Fast implementation of DeepMind's AlphaZero algorithm in Julia

AlphaZero.jl This package provides a generic , simple and fast implementation of Deepmind's AlphaZero algorithm: The core algorithm is only 2,000 lines of pure, hackab...