12

Text Classifier with Multiple Outputs and Multiple Losses in Keras

 3 years ago
source link: https://mc.ai/text-classifier-with-multiple-outputs-and-multiple-losses-in-keras/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this post we’ll go through the definition of a multi-label classifier, multiple losses, text preprocessing and a step-by-step explanation on how to build a multi-output RNN-LSTM in Keras.

The dataset that we’ll be working on consists of natural disaster messages that are classified into 36 different classes. The dataset was provided by Figure Eight . Example of input messages:

['Weather update - a cold front from Cuba that could pass over Haiti',
 'Is the Hurricane over or is it not over',
 'Looking for someone but no name',
 'UN reports Leogane 80-90 destroyed. Only Hospital St. Croix functioning. Needs supplies desperately.',
 'says: west side of Haiti, rest of the country today and tonight']

What’s a Multi-Label Classification problem?

Before explaining what it is, let’s first go trough the definition of a more common classification type: Multiclass. In a multiclass, the classes are mutually exclusive, i.e, you can only classify one class at a time. For example, if you have the classes: { Car, Person, Motorcycle}, you model will have to output: Car OR Person OR Motorcycle. For this kind of problem, a Softmax function is used for classification:

Softmax Classification function in a Neural Network

For the multi-label classification, a data sample can belong to multiple classes. From the example above, your model can classify, for the same sample, the classes: Car AND Person (imagining that each sample is an image that may contains these 3 classes).

In the studied dataset, there are 36 different classes where 35 of them have a binary output : 0 or 1; and 1 of them has 3 possible classes (a multiclass case): 0, 1 or 2.

Multiple Losses

Using multiple loss functions in the same model means that you are doing different tasks and sharing part of your model between these tasks. Sometimes, you may think that maybe it’s better to build different models for each different type of output, but in some situations sharing some layers of your Neural Network helps the models generalize better.

How Keras handles multiple losses?

From the Keras documentation , “…the loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. “. Therefore, the final loss is a weighted sum of each loss, passed to the loss parameter.

In the studied case, two different losses will be used:

  • For the binary classes, the metric used will be the binary_accuracy with the corresponding binary_crossentropy loss. Since there is only two possible classes for each output (0 or 1), the sigmoid function will be used as the activation function.
  • For the multiclass output , the metric used will be the sparse_categorical_accuracy with the corresponding sparse_categorical_crossentropy loss. For this output there are 3 possible classes: 0, 1 and 2, this way the softmax activation function will be used. Differently than the classical categorical_crossentropy loss, the first doesn’t require the output Y to be one-hot encoded. Hence, instead of transforming the output into: [1, 0, 0], [0, 1, 0] and [0, 0, 1], we can leave it as integers: [0], [1] and [2]. It’s important to highlight that both losses share the same equation:
Loss Function for categorical crossentropy and sparse categorical crossentropy

Text Preprocessing

Just like any other NLP problem, before applying the text input data into the model, we have to preprocess it. In this dataset, punctuations, url links and ‘@’ mentions were removed. Even tough ‘@’ mentions adds some information to the message, it doesn’t add value to the classification model. Hashtags (‘#’) may contain useful information, since they are usually related to events. Therefore, they were kept, removing only the ‘#’ character.

Stop words (most common words in a language) removal and lemmatization were also applied to the dataset.

The text samples are formatted into tensors that can be fed into a neural network using Keras utilities:

The above procedure includes mainly 3 steps :

  • First, a Tokenizer instance is fitted ( fit_on_texts ) to the corpus creating a vocabulary index based on word frequency. Every word is mapped to an index, so every word gets a unique integer value, lower integer means more frequent words. The size of words to keep is defined by the num_words parameter, i.e, vocabulary size. Only the most common words will be kept. In our dataset, the words were mapped as follows:
print(tokenizer.word_index){'water': 1, 'people': 2, 'food': 3, 'need': 4, 'help': 5, 'please': 6, 'earthquake': 7, 'would': 8, 'area': 9, 'like': 10, 'said': 11, 'country': 12,...}
  • The sentences from the input are then mapped to integers using the tokenizer.texts_to_sequences method. From our example:
'weather update cold front cuba could pa haiti'

is mapped to:

[138, 1480, 335, 863, 2709, 80, 411, 18]
  • Lastly, in order to create embeddings, all of our sentences need to be of same length. Hence, we use the pad_sequences to pad each sentence with zero.

Multi-Output and Multi-Loss RNN

For building this model we’ll be using Keras functional API and not the Sequential API since the first allows us to build more complex models, such as multiple outputs and inputs problems.

In order to summarize what we have until now:

  • each Input sample is a vector of integers of size MAXLEN (50)
  • each sample will be classified into 36 different classes where 35 of them have a binary output : 0 or 1; and 1 of them has 3 possible classes (a multiclass case): 0, 1 or 2.

Model Architecture

In this section, we’ll be training our own embeddings using Keras Embedding Layer.

The Embedding layer takes as input:

  • input_dim : the vocabulary size that we chose
  • output_dim : the size of the embedding. In our case, it was set to 50d.
  • input_length : Length of input sequences. In our case: MAXLEN

A Convolutional layer was added before the LSTM in order to speed-up the training time. CNN are more likely to extract local and deep features from sentences. You can read more about the CNN and RNN combination in this article .

Output Layer

The majority of the output classes are binary, but one of them is a multiclass output. As explained in the Multiple Losses section, the losses used are: binary_crossentropy and sparse_categorical_crossentropy .

Since the dataset is highly imbalanced, the class_weight parameter was added in order to reduce the imbalaced distributions. A Dense layer will be created for each output. The outputs will be stored into an array, while the metrics and losses, for each output, will be stored into corresponding dictionaries.

The above code iterates through each of the output binary columns and creates a dense layer, saving the corresponding metric and loss into a dictionary.

The code below applies the same process to the single multi class output column.

For each output we then define the weight for each class in a dictionary format:

We then can instantiate a Model class and train our model:

The model follows the format:

Graphical representation of the RNN

Apparently, Keras has an open issue with class_weights and binary_crossentropy for multi label outputs. The solution proposed above, adding one dense layer per output, is a valid solution.

Conclusion

In this post we’ve built a RNN text classifier using Keras functional API with multiple outputs and losses. We walked through an explanation about multiple losses and also the difference between a multi-label and multiclass classification problem.

You can check the full code, with extra analysis, in Here ! In this post we trained our own embeddings, in the GitHub Repo you can check the same model retrained using Pre-trained GloVe vectors .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK