33

Bootstrapping cutting-edge NLP models

 4 years ago
source link: https://towardsdatascience.com/bootstrapping-cutting-edge-nlp-models-baf62405a2b5?gi=cc6420522286
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to get up and running with XLNet and Pytorch in 5 mins

QfMZRbn.jpg!web

Photo by Pietro Jeng on Unsplash

What is XLNet

XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-schooler, it can answer simple questions. It can comprehend that a dog isn’t the same as a cat, but both of them are pets to humans.

Overall, XLNet is a model that builds on the advances of BERT.

XLNet solves NLP problems in 3 broad categories: classification, sequence labeling, and text generation —

Classification:

Classification tasks are the most common type of tasks in NLP.

Categorization (aka classification) tasks assign a category to a piece of text. More broadly, they answer a question of given a section of a text, tell me which category the text belongs to .

Tasks in the classification domain commonly answer questions like the ones below,

What medical billing code should we use for this visit? (description of visit provided)  Is this text spam? (text is provided)  Is this interesting to this user? (content and user profile provided)

Sequence labeling:

Another type of problem in NLP is the Sequence labeling. In Sequence labeling, we try to find something enclosed in the text provided. Commonly this type of task would include finding persons in the text provided(NER) or finding all co-references of an entity, i.e. if in the sentence “Mary jumped over a toad. It didn’t move.” The algorithm would find out ‘it’ refers to Mary, not the toad. Another example of Sequence labeling is to detect which ticker is associated with each mention of a company —

NVDA is scheduled to report second-quarter fiscal 2020 results on Aug 15.

In the trailing four quarters, the company’s (NVDA) earnings surpassed the Zacks Consensus Estimate thrice and missed the same (Zacks) once, the average positive surprise being 3.94%.

Text generation:

Third and last way XLNet can be used is for text generation. Here, given a short snippet of context, XLNet would predict the next word. And it would continue predicting the next word until instructed to stop. In the example below, Given the input of The quick brown XLNet would first predict fox , then look at the context as the whole and predict the next word jumped and so on.

The quick brown <fox> <jumped> <over> …

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK