19

Rank your things… An AES story

 4 years ago
source link: https://mc.ai/rank-your-things-an-aes-story/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Rank your things… An AES story

Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades. But here, let’s look at it in an industrial problem setting.

Source: link

You are a basketball coach. You have 40 kids under you who want to play ball. You are asked to select a team of 5 tallest kids. Pretty easy! You ask all of them to line up in the decreasing order of their heights and you pick the first 5.

Now say you are comic book writer. Not really a writer, but you are the guy who gets to decide the fancy villain names. You have plenty of interns under you who create villain descriptions and also names the villains for you. They create hundreds of such potential villain characters for your comic and now you have to choose which among them should you even consider for your comics. The tricky bit is that you might want to do the selection based on what your readers like too.

Technically, you want to score each of your potential villains and rank them in the decreasing order of the reader’s affinity score (or rating).

(Ignore the detail of how you have the reader affinity. Assume that the comic God gave you those.)

So, you have all your villains (that eventually got into the comics) and their respective reader affinity scores. Now your task is to use that information (somehow, duh) and rank or score the future villains that your interns create.

In literature, this is called the Automatic Essay Scoring or Automatic Text Scoring problem.

The Approach

This is a domain that is continuously progressing in the research world. So, you’ll be able to find a lot of solutions. Let’s focus on one solution that gets the job done.

One way to think of it is as a prediction problem and try to predict the reader affinity scores. But there is a small issue with that and it may not help in solving our problem. The reader affinity score is for the whole comic and not just for the villain. A person can still give a good score if she likes the plot but hates the villain. If we are trying to predict this score, we’ll need to use a lot more information (like the comic category, month of release, age group targeted etc,) and not just the villain information (like the name, description, etc).

Let’s also note the fact that the predicted score of one villain is not really of use to us because our job is to find the best villains from a pool of villains. Individually, the scores may not make as much sense as they would if they were considered relatively. If we have 100 scores, we can easily know which villain is likely to perform better than the others.

Therefore, we can still proceed with our prediction logic but instead of looking at the predicted scores objectively, we just need to make sure they are correlating with the actual scores. This means that if a villain X is scored higher than villain Y , irrespective of how good or bad our prediction is, if the actual scores also follow the same order or rank then it’s a win.

The Solution

Source: link

To get straight to one solution (out of many, like I said the literature is pretty lit :fire:), we use two specific types of models. Since scoring of text is the task, we need some sort of a text-to-embedding technique to represent our text as vectors. Any text-to-embedding technique can be picked but I’ve chosen the Universal Sentence Encoder .

usemodel = hub.Module('models/sentence_encoder')def get_use_vectors(list_text):
'''
Computing the USE vector representation of list of sentences
@param list_text : list of sentences
'''
messages = list_text
num_batches = math.ceil(len(messages) / BATCH_SIZE)
message_embeddings = []
with tf.Session() as session:
session.run([tf.global_variables_initializer(),
tf.tables_initializer()])
for batch in range(num_batches):
print(batch * batch_size, batch * batch_size + batch_size)
batch_msgs = messages[batch * batch_size: batch * batch_size + batch_size]

message_embeddings_temp = session.run([model_use(batch_msgs)])

message_embeddings.append(message_embeddings_temp)

all_embedding = np.concatenate(tuple(message_embeddings))
return all_embedding1, all_embedding2

This model is used to convert the villain names and their descriptions into vectors and we use these as features (along with other features) in a prediction model. The other features could be categorical such as category of comic, name of author, etc or ordinal features such as number of purchase, price, etc.

These can be one hot encoded and appended to our feature list.

The prediction model is a simple Random Forest Regressor model taken straight out of the sklearn tutorial section.

import pickle
from sklearn.ensemble import RandomForestRegressor

params = {'n_estimators':[20, 50, 100], 'max_depth':[2, 4, 6, 8, None], 'min_samples_split': [2, 4, 6, 8],
'n_jobs': [10]}


rf = RandomForestRegressor(n_estimators = 250, random_state = 42)

grid = GridSearchCV(rf, params)

grid.fit(X_train, y_train)

predictions = grid.predict(X_test)

errors = abs(predictions - y_test)

print(grid.best_score_)
print(grid.best_estimator_)

This gives us a model that is trained on our past historic data that predicts how well a villain name/description would perform. Technically, it’s a user affinity score predictor. But, like we discussed, since we aren’t using all possible and available features to predict this score and since we aren’t treating this as a user affinity score predictor model, the final predictions that we get will be inaccurate. But if the scores give us a relative indication about the performance of two or more villains, it’ll help us pick the top villains.

The Metrics

Cohen’s Kappa Score is usually used as a metric to identify how close our ranking or ordering of the predictions is when compared to the actual ordering. But, this metric assumes the predictions to be categories such as marks (0 to 5). We have a more continuous prediction and hence this metric wouldn’t work well for us.

For this, we can use simple Spearman and Pearson Correlations.

Plotting the actual vs the predicted scores plot gives a good idea if our predictions are following the right trend or not.

The correlation coefficients corresponding to the predictions to the left are:

Pearson: 0.65, pvalue = 2.14 e-92 | Spearman 0.60, pvalue =8.13 e-123


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK