GitHub - kimiyoung/transformer-xl
source link: https://github.com/kimiyoung/transformer-xl
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
This repository contains the code in both PyTorch and TensorFlow for our paper
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai*, Zhilin Yang*, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)
Preprint 2018
TensorFlow
- The source code is in the
tf/
folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training. - Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
- Please refer to
tf/README.md
for details.
PyTorch
- The source code is in the
pytorch/
folder, supporting single-node multi-gpu training via the modulenn.DataPrallel
. - Please refer to
pytorch/README.md
for details.
Results
Transformer-XL achieves new state-of-the-art results on multipole language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.
Method enwiki8 text8 One Billion Word WT-103 PTB (w/o finetuning) Previous Best 1.06 1.13 23.7 20.5 55.5 Transformer-XL 0.99 1.08 21.8 18.3 54.5Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK