Learning sentence embeddings by Natural Language Inference
source link: https://www.tuicool.com/articles/hit/7BFJfuR
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Learning sentence embeddings by Natural Language Inference
Unsupervised learning approach seems like a normal way to build word, sentence or document embeddings because it is more generalized such that pre-trained embeddings result can be transfer to other NLP downstream problems. For example,skip-gram in word embeddings and skip-though in sentence embeddings and distributed bag-of-words in paragraph embeddings.
Conneau et al. noted that supervised learning in ImageNet (Image Classification) doing good job in transferring result to downstream problems. Some features can be transferred to downstream somehow. Therefore, Conneau et al. used textual entailment data to train a sentence embeddings layer which calls InferSent.
After reading this article, you will understand:
- InferSent Design
- Architecture
- Implementation
- Take Away
InferSent Design
The idea is that team uses SNLI (Standford Natural Language Inference) data to train a model for Natural Language Inference (NLI) problem. NLI target to find the relationship between sentence 1 (premise) and sentence 2 (hypothesis). There are three categories which are entailment, contradiction, and neutral. Here is the very sample example:
- I eat fruit.
- I eat apple.
Intuitively, the relationship is entailment. Authors believe that NLI is a suitable task to understand semantic relationships within sentences such that it helps to build a good embeddings for sentence embeddings for downstream NLP problems.
Architecture
The overall idea is that two sentences (premise input and hypothesis input) will be transformed by sentence encoder (same weights). After that leveraging 3 matching methods to recognize relations between premise input and hypothesis input.
- Concatenation of two vectors
- Element-wise product two vectors
- Absolute element-wise difference of two vectors
After the overview, may jump into the architecture of sentence encoders. Conneau et al. evaluated 7 different architectures:
- Standard LSTM
- Standard GRU
- Concatenation of last hidden states of forward and backward GRU
- Bi-directional LSTM with mean polling
- Bi-directional LSTM with max polling
- Self-attentive Network (Attention with BiLSTM)
- Hierarchical convolutional networks
Before conclude the best approach first, we may believe that Attention with BiLSTM should be the best approach as attention mechanism helps to identify important weight. Actually, it may harm when using it in transfer learning. On the other hand, BiLSTM with mean polling perform not very good may due to unable to locate the important part.
From the experiment result, the best approach is Bi-directional LSTM with max polling.
Implementation
There are 2 ways to use InferSent. First of is using a pre-trained embeddings layer in your NLP problems. Another one is building InferSent by your self.
Load pre-trained Embeddings
Facebook research team provide 2 pre-trained models which are version 1 (based on GloVe) and version 2 (based on fastText).
Loading both InferSent pre-trained model and GloVe (or fastText) model than you can encode sentence to vectors.
# Init InferSent Model infer_sent_model = InferSent() infer_sent_model.load_state_dict(torch.load(dest_dir + dest_file))
# Setup Word Embedding Model infer_sent_model.set_w2v_path(word_embs_model_path)
# Build Vocab for InferSent model model.build_vocab(sentences, tokenize=True)
# Encode sentence to vectors model.encode(sentences, tokenize=True)
Train embeddings
Another approach is training the embeddings by your self. You may either using your own data or using original data set. Here is the step of going second approach.
Clone InferSent original repo to local. Then Execute “get_data.bash” in console such that SNLI ( Stanford Natural Language Inference) and MultiNLI )MultiGenre NLI) corpus will be downloaded and processed. Make sure that you have to execute the following shell script in current folder but not other relative path
./get_data.bash
After that, downloading GloVe (and/or fastText)
mkdir dataset/GloVe curl -Lo dataset/GloVe/glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip unzip dataset/GloVe/glove.840B.300d.zip -d dataset/GloVe/ mkdir dataset/fastText curl -Lo dataset/fastText/crawl-300d-2M.vec.zip https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip unzip dataset/fastText/crawl-300d-2M.vec.zip -d dataset/fastText/
Downloading InferSent pre-trained model. Version 1 is trained by using GloVe while Version 2 leveraged fastText.
curl -Lo encoder/infersent1.pkl https://s3.amazonaws.com/senteval/infersent/infersent1.pkl curl -Lo encoder/infersent2.pkl https://s3.amazonaws.com/senteval/infersent/infersent2.pkl
Finally, you can execute the following command to train the embeddings layers.
python train_nli.py --word_emb_path ./glove.42B.300d.txt
For my single GPU VM, it takes about 1 day to finish the training.
Take Away
To access all code, you can visit my github repo.
- Compare to other embedding approaches, InferSent uses a supervised learning to compute word vectors.
- InferSent leverages word embeddings (GloVe/ fastText) to build sentence embeddings.
- Pretrained model supports both GloVe (version 1) and fasttext (version 2)
Reference
Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK