39

Build and Compare 3 Models — NLP Sentiment Prediction

 4 years ago
source link: https://towardsdatascience.com/build-and-compare-3-models-nlp-sentiment-prediction-67320979de61?gi=372e09a34602
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Predicting sentiments with 3 Algorithms from Scratch — Beginner Friendly.

Build and Compare 3 Models — NLP Prediction

Natural Language Processing on Python (Jupyter)!

Nov 12 ·14min read

This project was created in an attempt to learn and understand how various classification algorithms work within a Natural Language Processing Model. Natural Language Processing, which I will now refer to as NLP, is a branch of machine learning that focuses on enabling computers to interpret and process human languages in both speech and text forms.

a6zMvuA.jpg!web

Photo by Patrick Tomasso on Unsplash

In this pipeline, I go through the following steps:

  1. Import required packages and libraries
  2. Import the dataset
  3. Process text in the dataset before it can be analyzed by the computer
  4. Create a Bag of Words model
  5. Splitting the dataset into Train & Test sets
  6. Naive Bayes Algorithm
  7. Decision Tree Algorithm
  8. Random Forest Algorithm
  9. Comparing Accuracy, Precision, Recall, and F1 Scores

|| II || The problem

iAJnya7.jpg!web

Photo by Zulmaury Saavedra on Unsplash

For this project, I will be using a dataset sourced from Kaggle, which contains 1000 reviews for a pizzeria by different users. |Link to the dataset|

Humans can read a review and tell whether it is positive or negative. What if we could create a model to classify them as positive or negative? What’s the best way to do this?

First, let’s talk about the process. We start off by pre-processing the data, removing unnecessary words that don’t help our prediction. Then, we take important words in their stemmed forms (e.g lov is the stem for loved, loving, or lovely) . We then train the machine to learn which reviews are positive based on their word stems. After that we test the data using similar information, to see how accurately our machine can predict whether a review is positive or negative (1 or 0).

|| III ||Importing basic libraries

Here, we import all the libraries required for this model to work. Before you begin, make sure you have all the dependencies installed. We will be working mainly with pandas, numpy, re, nltk, matplotlib, and sci-kit learn.

Make sure to pip install on the command line all the libraries mentioned right above.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK