52

Simple Transformers — Introducing The Easiest BERT, RoBERTa, XLNet, and XLM Libr...

 4 years ago
source link: https://www.tuicool.com/articles/myE3Ebf
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Preface

The Simple Transformers library is built as a wrapper around the excellent Transformers library by Hugging Face. I am eternally grateful for the hard work done by the folks at Hugging Face to enable the public to easily access and use Transformer models. I don’t know what I’d have done without you guys!

Introduction

I believe it’s fair to say that the success of Transformer models have been nothing short of phenomenal in advancing the field of Natural Language Processing. Not only have they shown staggering leaps in performance on many NLP tasks they were designed to solve, pre-trained Transformers are also almost uncannily good at Transfer Learning. This means that anyone can take advantage of the long hours and the mind-boggling computational power that has gone into training these models to perform a countless variety of NLP tasks. You don’t need the deep pockets of Google or Facebook to build a state-of-the-art model to solve your NLP problem anymore!

Or so one might hope. The truth is that getting these models to work still requires substantial technical know-how. Unless you have expertise or at least experience in deep learning, it can seem a daunting challenge. I am happy to say that my previous articles on Transformers (here and here ) seem to have helped a lot of people get a start on using Transformers. Interestingly, I noticed that people of various backgrounds (linguistics, medicine, and business to name but a few) were attempting to use these models to solve problems in their own domain. However, the technical barriers that need to be overcome in order to adapt Transformers to specific tasks are non-trivial and may even be rather discouraging.

Simple Transformers

This conundrum was the main motivation behind my decision to develop a simple library to perform binary text classification (the most common NLP task that I’ve seen) using Transformers. The idea was to make it as simple as possible, which means abstracting away a lot of the implementational and technical details. The implementation of the library can be found on Github . I highly encourage you to look at it to get a better idea of how everything works, although it is not necessary to know the inner details to use the library.

To that end, the Simple Transformers library was written so that a Transformer model can be initialized, trained on a given dataset, and evaluated on a given dataset, in just 3 lines of code! Let’s see how it’s done, shall we?

Installation

  1. Install Anaconda or Miniconda Package Manager from here
  2. Create a new virtual environment and install the required packages.
    conda create -n transformers python pandas tqdm
    conda activate transformers
    If using cuda:
    conda install pytorch cudatoolkit=10.0 -c pytorch
    else:
    conda install pytorch cpuonly -c pytorch
    conda install -c anaconda scipy
    conda install -c anaconda scikit-learn
    pip install transformers
    pip install tensorboardx
  3. Install simpletransformers .
    pip install simpletransformers

Usage

A quick look at how to use this library on the Yelp Reviews dataset.

  1. Download Yelp Reviews Dataset .
  2. Extract train.csv and test.csv and place them in the directory data/ .

(Bash users can use this script to download the dataset)

Nothing fancy here, we are just getting the data in the correct form. This is all you have to do for any dataset.

  • Create two pandas DataFrame objects for the train and eval portions.
  • Each DataFrame should have two columns. The first column contains the text that you want to train or evaluate and has the datatype str . The second column has the corresponding label and has the datatype int .

With the data in order, it’s time to train and evaluate the model.

That’s it!

For making predictions on other text, TransformerModel comes with a predict(to_predict) method which given a list of text, returns the model predictions and the raw model outputs.

For more details on all available methods, please see the Github repo . The repo also contains a minimal example of using the library.

Default settings and how to change them

The default args used are given below. Any of these can be overridden by passing a dict containing the corresponding key/value pairs to the init method of TransformerModel. (See example below)

self.args = {
   'model_type':  'roberta',
   'model_name': 'roberta-base',
   'output_dir': 'outputs/',
   'cache_dir': 'cache/','fp16': True,
   'fp16_opt_level': 'O1',
   'max_seq_length': 128,
   'train_batch_size': 8,
   'eval_batch_size': 8,
   'gradient_accumulation_steps': 1,
   'num_train_epochs': 1,
   'weight_decay': 0,
   'learning_rate': 4e-5,
   'adam_epsilon': 1e-8,
   'warmup_ratio': 0.06,
   'warmup_steps': 0,
   'max_grad_norm': 1.0,'logging_steps': 50,
   'evaluate_during_training': False,
   'save_steps': 2000,
   'eval_all_checkpoints': True,
   'use_tensorboard': True,'overwrite_output_dir': False,
   'reprocess_input_data': False,
}

To override any of these, simply pass in a dictionary with the appropriate key/value pair to the TransformerModel.

For an explanation of what each argument does, please refer to the Github repo .

Conclusion

That’s all folks! The easiest way to use Transformer models that I know of.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK