Learning sentence embeddings by Natural Language Inference

Unsupervised learning approach seems like a normal way to build word, sentence or document embeddings because it is more generalized such that pre-trained embeddings result can be transfer to other NLP downstream problems. For example,skip-gram in word embeddings and skip-though in sentence embeddings and distributed bag-of-words in paragraph embeddings.

“closeup photo of person carrying professional video recorder” by Laura Lee Moreau on Unsplash

Conneau et al. noted that supervised learning in ImageNet (Image Classification) doing good job in transferring result to downstream problems. Some features can be transferred to downstream somehow. Therefore, Conneau et al. used textual entailment data to train a sentence embeddings layer which calls InferSent.

After reading this article, you will understand:

InferSent Design
Architecture
Implementation
Take Away

InferSent Design

The idea is that team uses SNLI (Standford Natural Language Inference) data to train a model for Natural Language Inference (NLI) problem. NLI target to find the relationship between sentence 1 (premise) and sentence 2 (hypothesis). There are three categories which are entailment, contradiction, and neutral. Here is the very sample example:

“two apples and walnuts on white towel” by Alex Kotomanov on Unsplash

I eat fruit.
I eat apple.

Intuitively, the relationship is entailment. Authors believe that NLI is a suitable task to understand semantic relationships within sentences such that it helps to build a good embeddings for sentence embeddings for downstream NLP problems.

Architecture

The overall idea is that two sentences (premise input and hypothesis input) will be transformed by sentence encoder (same weights). After that leveraging 3 matching methods to recognize relations between premise input and hypothesis input.

Conneau et al. (2017)

Concatenation of two vectors
Element-wise product two vectors
Absolute element-wise difference of two vectors

After the overview, may jump into the architecture of sentence encoders. Conneau et al. evaluated 7 different architectures:

Standard LSTM
Standard GRU
Concatenation of last hidden states of forward and backward GRU
Bi-directional LSTM with mean polling
Bi-directional LSTM with max polling
Self-attentive Network (Attention with BiLSTM)
Hierarchical convolutional networks

Before conclude the best approach first, we may believe that Attention with BiLSTM should be the best approach as attention mechanism helps to identify important weight. Actually, it may harm when using it in transfer learning. On the other hand, BiLSTM with mean polling perform not very good may due to unable to locate the important part.

#5 Bi-directional LSTM with max polling (Conneau et al., 2017)

#6 Self-attentive Network Architecture (Conneau et al., 2017)

Hierarchical convolutional networks (Conneau et al., 2017)

From the experiment result, the best approach is Bi-directional LSTM with max polling.

Conneau et al. (2017)

Implementation

There are 2 ways to use InferSent. First of is using a pre-trained embeddings layer in your NLP problems. Another one is building InferSent by your self.

Load pre-trained Embeddings

Facebook research team provide 2 pre-trained models which are version 1 (based on GloVe) and version 2 (based on fastText).

Loading both InferSent pre-trained model and GloVe (or fastText) model than you can encode sentence to vectors.

# Init InferSent Model
infer_sent_model = InferSent()
infer_sent_model.load_state_dict(torch.load(dest_dir + dest_file))

# Setup Word Embedding Model
infer_sent_model.set_w2v_path(word_embs_model_path)

# Build Vocab for InferSent model
model.build_vocab(sentences, tokenize=True)

# Encode sentence to vectors
model.encode(sentences, tokenize=True)

Train embeddings

Another approach is training the embeddings by your self. You may either using your own data or using original data set. Here is the step of going second approach.

Clone InferSent original repo to local. Then Execute “get_data.bash” in console such that SNLI ( Stanford Natural Language Inference) and MultiNLI )MultiGenre NLI) corpus will be downloaded and processed. Make sure that you have to execute the following shell script in current folder but not other relative path

./get_data.bash

After that, downloading GloVe (and/or fastText)

mkdir dataset/GloVe
curl -Lo dataset/GloVe/glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip dataset/GloVe/glove.840B.300d.zip -d dataset/GloVe/
mkdir dataset/fastText
curl -Lo dataset/fastText/crawl-300d-2M.vec.zip https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip
unzip dataset/fastText/crawl-300d-2M.vec.zip -d dataset/fastText/

Downloading InferSent pre-trained model. Version 1 is trained by using GloVe while Version 2 leveraged fastText.

curl -Lo encoder/infersent1.pkl https://s3.amazonaws.com/senteval/infersent/infersent1.pkl
curl -Lo encoder/infersent2.pkl https://s3.amazonaws.com/senteval/infersent/infersent2.pkl

Finally, you can execute the following command to train the embeddings layers.

python train_nli.py --word_emb_path ./glove.42B.300d.txt

For my single GPU VM, it takes about 1 day to finish the training.

Take Away

To access all code, you can visit my github repo.

Compare to other embedding approaches, InferSent uses a supervised learning to compute word vectors.
InferSent leverages word embeddings (GloVe/ fastText) to build sentence embeddings.
Pretrained model supports both GloVe (version 1) and fasttext (version 2)

Reference

Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

InfeSent in Pytorch

Learning sentence embeddings by Natural Language Inference