53

Text Mining for Dummies: Text Classification with Python

 5 years ago
source link: https://towardsdatascience.com/text-mining-for-dummies-text-classification-with-python-98e47c3a9deb?gi=8ce694b35d59
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The common steps of any NLP project in 20 lines of code

QrMZN3y.jpg!web

This short-read shows the common steps of any text mining project. If you want to follow along in a notebook, you can get the notebook over here .

This goal is not to give an exhaustive overview of text mining, but to quickstart your thinking and give ideas for further enhancements.

Step 1: Data

For teaching purposes, we start with a very very small data set of 6 reviews.

Data often comes from web scraping review websites, because they are good sources of data with at the same time a raw text and a numeric evaluation.

Step 2: Data preparation

The data will often have to be cleaned more than in this example, eg regex, or python string operations.

The real challenge of text mining is converting text to numerical data. This is often done in two steps:

  • Stemming / Lemmatizing: bringing all words back to their ‘base form’ in order to make an easier word count
  • Vectorizing: applying an algorithm that is based on wordcount (more advanced)
  • In this example, I use a LancasterStemmer and a CountVecotrizer, which are well-known and easy-to-use methods.

Step 2a: LancasterStemmer to bring words back to their base form

E73INf3.png!web

Step 2b: CountVecorizer to apply Bag Of Word (basically a word count) for vectorizing (that means converting text data into numerical data)

7zqUfeZ.png!web

Step 3: Machine Learning

Since the text has been converted to numeric data, just use any method that you could use on regular data!

IrUFvmy.png!web

I hope this short example helps you on your journey. Don’t hesitate to ask any questions in the comments. Thanks for reading!

Link to the complete notebook: over here.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK