A very small and self-contained gist to train a GPT-2 transformer model on wikit...

1 year ago

source link: https://gist.github.com/thomwolf/ca135416a30ea387aa20edaa9b21f0ed
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103 · GitHub

Instantly share code, notes, and snippets.

A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103

Author

This gist should give a word-level perplexity of about 29 on wikitext-103 validation dataset when training 15h on 8 v100 (a few days on a single GPU). To get word-level perplexity you need to convert sub-word nll in word-level perplexity (see here for details on the conversion process).

A few words:

hyper-parameters are copied from Transformer-XL base configuration (which get 24 test ppl) maybe better ones could be found for this configuration
we use an open-vocabulary (sub-words) and no fancy adaptative softmax or input so it's expected to get a higher ppl than transformer-xl
the main practical tool missing from the training scripts is evaluation on a validation dataset. Please check our NAACL tutorial code base for a more convenient training script.

Recommend

A very small and self-contained gist to train a GPT-2 transformer model on wikit...

Recommend

Poison Ivy Quiz

You can now use hardware made for game streamers in Microsoft Teams with a new p...

JDK8 四大核心函数式接口及扩展接口总结 - 我恰芙蓉王

叩响精准基因编辑的大门——全球首款“基因魔剪”即将面世

4 reasons why people are drawn to Polygon (MATIC)

国产科幻FPS大作！《边境》官宣2月6日开启新测试

净利润暴跌98%！这一电商巨头，将裁员近2万人？

Scientists Grew Mini Human Guts Inside Mice

Alleged screenshots of ChatGPT-powered Microsoft's "new Bing" leak

网易云音乐携手伊利打造音乐营销事件，主题曲播放量破千万-品玩

About Joyk