33

Helping Machines Understand Complex Questions

 4 years ago
source link: https://towardsdatascience.com/helping-machines-understand-complex-questions-48536fd93ef2?gi=4601ef609a6e
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

yYbAvyM.jpg!web

Photo by Dmitry Ratushny on Unsplash

(This project was inspired by Google QUEST Kaggle Challenge )

The field of Natural Language Processing has always been fascinating to me. To imagine that a computer can understand and respond to human language is nothing short of amazing. I’ve been wanting to work on a project that involved both NLP and also deep learning for a little while, so I decided to create a project where a stakeholder needs to improve his Question & Answer system.

As advanced as QA systems and chatbots have become, they still struggle when faced with questions that are more complex in nature. Machines can easily compute dimensions such as number of characters, or vocabulary, but humans are much better at computing qualitative metrics such as “is this question looking for an opinion or recommendation?”

A model that can predict such subjective aspects of questions and correctly identify complexity can be very helpful in improving how these tools are built and implemented.

You can find the data we will be using here . The original data consists of over six thousand samples of question-answer pairs from various StackExchange and 30 human-generated subjective labels for each sample. I’ve adapted the problem from this competition to a more business-like situation. Our imaginary stakeholder is only interested in a machine-based solution to evaluate the questions it receives from its customers, and only over a few subjective aspects. The final goal is for his QA system to automatically answer the questions that are more factual and flag complex questions for employees to answer them.

So, we have a multi-class, multi-label text classification problem, where a sample can be classified for more than one class at the same time, and 6,079 samples to work with. We will use GloVe pre-trained word embeddings and a neural network model with Keras. Let’s start!

In order to not make this post too long, we will focus on the natural language preprocessing and modeling aspects, but you can find the entirety of this project including data exploration on my Github page .


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK