![](/style/images/good.png)
![](/style/images/bad.png)
Introduction to NLTK library in Python
source link: http://uzairadamjee.com/blog/nltk-in-python/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction to NLTK library in PythonSkip to content
![nltk.png](http://uzairadamjee.com/blog/wp-content/uploads/2020/10/nltk.png)
Introduction to NLTK library in Python
NLTK is an essential library supports tasks such as classification, stemming, tagging, parsing, semantic reasoning, and tokenization in Python. It’s basically your main tool for natural language processing and machine learning. Today it serves as an educational foundation for Python developers who are dipping their toes in this field (and machine learning).
It’s a free and open-source library that is available on Windows, Mac OS and Linux with plenty of tutorials to make your entry into the world of NLP smooth.
Resources
· Documentation — https://www.nltk.org/
· NLTK Book — http://www.nltk.org/book/
Features
- Tokenization Sentences/Words
- Part-of-speech tagging
- Stemming, Lemmatization
- Sentiment Analysis
- Wordnet Support
- Language Translation Translation
Installation
pip install nltk
Import Library
import nltk
#nltk.download()
Sentence Tokenize
from nltk.tokenize import sent_tokenizetext = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard."tokenized_text=sent_tokenize(text)print(tokenized_text)
Word Tokenize
from nltk.tokenize import word_tokenizetokenized_word=word_tokenize(text)
print(tokenized_word)
Stopwords
from nltk.corpus import stopwordsexample_sent = "This is a sample sentence, showing off the stop words filtration."stop_words = set(stopwords.words('english'))word_tokens = word_tokenize(text)filtered_words = []for w in word_tokens:
if w not in stop_words:
filtered_words.append(w)print(word_tokens)
print()
print(filtered_words)
After running this cell we will get filtered words without stopword (is, the,how, are, you etc) that belong to our data.
Stemming
from nltk.stem import PorterStemmerps = PorterStemmer()
example_words = ["python","pythoner","rocks","pythoned","pythonly"]for w in example_words:
print(ps.stem(w))
Lemmatization
from nltk.stem import PorterStemmer,WordNetLemmatizerlemmatizer = WordNetLemmatizer()print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("pythonly"))
Stemming technique only looks at the form of the word whereas Lemmatization technique looks at the meaning of the word. It means after applying lemmatization, we will always get a valid word.
POS Tagging
text = word_tokenize(“And now for something completely different”)
nltk.pos_tag(text)
WordNet
WordNet is the lexical database i.e. dictionary for the English language, specifically designed for natural language processing.
Synset is a special kind of a simple interface that is present in NLTK to look up words in WordNet. Synset instances are the groupings of synonymous words that express the same concept. Some of the words have only one Synset and some have several.
from nltk.corpus import wordnetsyns = wordnet.synsets(“program”)
print(syns[0].name())
print(syns[0].lemmas()[0].name())
print(syns[0].definition())
print(syns[0].examples())
Frequency Distribution
from nltk.probability import FreqDistfdist = FreqDist(tokenized_word)
print(fdist)
#<FreqDist with 25 samples and 30 outcomes>
fdist.most_common(2)
[(‘is’, 3), (‘,’, 2)]
# Frequency Distribution Plot
import matplotlib.pyplot as plt
fdist.plot(30,cumulative=False)
plt.show()
This comes to the end of this article.
Full code can be download from my github;
https://github.com/uzairaj/Nltk
Check out more blogs on my website and YouTube channel
http://uzairadamjee.com/bloghttps://www.youtube.com/channel/UCCxSpt0KMn17sMn8bQxWZXA
Thank you for reading
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Post navigation
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK