7

A very small and self-contained gist to train a GPT-2 transformer model on wikit...

 1 year ago
source link: https://gist.github.com/thomwolf/ca135416a30ea387aa20edaa9b21f0ed
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103 · GitHub

Instantly share code, notes, and snippets.

A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103

Author

This gist should give a word-level perplexity of about 29 on wikitext-103 validation dataset when training 15h on 8 v100 (a few days on a single GPU). To get word-level perplexity you need to convert sub-word nll in word-level perplexity (see here for details on the conversion process).

A few words:

  • hyper-parameters are copied from Transformer-XL base configuration (which get 24 test ppl) maybe better ones could be found for this configuration
  • we use an open-vocabulary (sub-words) and no fancy adaptative softmax or input so it's expected to get a higher ppl than transformer-xl
  • the main practical tool missing from the training scripts is evaluation on a validation dataset. Please check our NAACL tutorial code base for a more convenient training script.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK