Introduction to NLTK library in Python

Introduction to NLTK library in PythonSkip to content

October 24, 2020 uzairadamjee 0 Comments

NLTK is an essential library supports tasks such as classification, stemming, tagging, parsing, semantic reasoning, and tokenization in Python. It’s basically your main tool for natural language processing and machine learning. Today it serves as an educational foundation for Python developers who are dipping their toes in this field (and machine learning).

It’s a free and open-source library that is available on Windows, Mac OS and Linux with plenty of tutorials to make your entry into the world of NLP smooth.

Resources

· Documentation — https://www.nltk.org/

· NLTK Book — http://www.nltk.org/book/

Features

Tokenization Sentences/Words
Part-of-speech tagging
Stemming, Lemmatization
Sentiment Analysis
Wordnet Support
Language Translation Translation

Installation

pip install nltk

Import Library

import nltk
#nltk.download()

Sentence Tokenize

from nltk.tokenize import sent_tokenizetext = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard."tokenized_text=sent_tokenize(text)print(tokenized_text)

Word Tokenize

from nltk.tokenize import word_tokenizetokenized_word=word_tokenize(text)
print(tokenized_word)

Stopwords

from nltk.corpus import stopwordsexample_sent = "This is a sample sentence, showing off the stop words filtration."stop_words = set(stopwords.words('english'))word_tokens = word_tokenize(text)filtered_words = []for w in word_tokens:
    if w not in stop_words:
        filtered_words.append(w)print(word_tokens)
print()
print(filtered_words)

After running this cell we will get filtered words without stopword (is, the,how, are, you etc) that belong to our data.

Stemming

from nltk.stem import PorterStemmerps = PorterStemmer()
example_words = ["python","pythoner","rocks","pythoned","pythonly"]for w in example_words:
    print(ps.stem(w))

Lemmatization

from nltk.stem import PorterStemmer,WordNetLemmatizerlemmatizer = WordNetLemmatizer()print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("pythonly"))

Stemming technique only looks at the form of the word whereas Lemmatization technique looks at the meaning of the word. It means after applying lemmatization, we will always get a valid word.

POS Tagging

text = word_tokenize(“And now for something completely different”)
nltk.pos_tag(text)

WordNet

WordNet is the lexical database i.e. dictionary for the English language, specifically designed for natural language processing.

Synset is a special kind of a simple interface that is present in NLTK to look up words in WordNet. Synset instances are the groupings of synonymous words that express the same concept. Some of the words have only one Synset and some have several.

from nltk.corpus import wordnetsyns = wordnet.synsets(“program”)
print(syns[0].name())
print(syns[0].lemmas()[0].name())
print(syns[0].definition())
print(syns[0].examples())

Frequency Distribution

from nltk.probability import FreqDistfdist = FreqDist(tokenized_word)
print(fdist)
#<FreqDist with 25 samples and 30 outcomes>
fdist.most_common(2)
[(‘is’, 3), (‘,’, 2)]
# Frequency Distribution Plot
import matplotlib.pyplot as plt
fdist.plot(30,cumulative=False)
plt.show()

This comes to the end of this article.

Full code can be download from my github;
https://github.com/uzairaj/Nltk

Check out more blogs on my website and YouTube channel
http://uzairadamjee.com/bloghttps://www.youtube.com/channel/UCCxSpt0KMn17sMn8bQxWZXA

Thank you for reading

Introduction to NLTK library in Python

Introduction to NLTK library in Python

Installation

Import Library

Sentence Tokenize

Word Tokenize

Stopwords

Stemming

Lemmatization

POS Tagging

WordNet

Frequency Distribution

Leave a Reply Cancel reply

Post navigation

Recommend

Disable sleep on Ubuntu

Introduction to computer programming with flowcharts

Programmatically Identifying Political Media - Max Burstein's Blog

What is python used for: Beginner’s Guide to python

Higher Kinded Types in Python

Automating the Documentation of ML Experiments using Python and AsciiDoc

Animated Transitions with Easing

Real-Time Pose Estimation using AlphaPose, PyTorch, and Deep Learning

Status update, November 2020

Maps with Django (part 1): GeoDjango, SpatiaLite and Leaflet

About Joyk