GitHub - goru001/inltk: Natural Language Toolkit for Indian Languages - JOYK Joy of Geek, Geek News, Link all geek

README.md

Natural Language Toolkit for Indian Languages (iNLTK)

Installation

pip install http://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl
pip install inltk

iNLTK runs on CPU and NOT on GPU, as is the desired behaviour for most of the Deep Learning models in production.

The first command above will install pytorch-cpu, which, as the name suggests, does not have cuda support.

Note: inltk is currently supported only on Linux with Python >= 3.6

Supported languages

Language Code Hindi hi Punjabi pa Sanskrit sa Gujarati gu Kannada kn Malyalam ml Nepali ne Odia or Marathi mr Bengali bn

Usage

Setup the language

from inltk.inltk import setup

setup('<code-of-language>') // if you wanted to use hindi, then setup('hi')

Note: You need to run setup('<code-of-language>') when you use a language for the FIRST TIME ONLY. This will download all the necessary models required to do inference for that language.

Tokenize

from inltk.inltk import tokenize

tokenize(text ,'<code-of-language>') // where text is string in <code-of-language>

Predict Next 'n' words

from inltk.inltk import predict_next_words

predict_next_words(text , n, '<code-of-language>') 

// text --> string in <code-of-language>
// n --> number of words you want to predict (integer)

Note: You can also pass a fourth parameter, randomness, to predict_next_words. It has a default value of 0.8

Identify language

from inltk.inltk import identify_language

identify_language(text)

// text --> string in one of the supported languages

Example:

>> identify_language('न्यायदर्शनम् भारतीयदर्शनेषु अन्यतमम्। वैदिकदर्शनेषु ')
'sanskrit'

Repositories containing models used in iNLTK

Language Repository Perplexity of Language model Wikipedia Articles Dataset Classification accuracy Classification Kappa score Hindi NLP for Hindi ~36 55,000 articles ~79 (News Classification) ~30 (Movie Review Classification) Punjabi NLP for Punjabi ~13 44,000 articles ~89 (News Classification) ~60 (News Classification) Sanskrit NLP for Sanskrit ~6 22,273 articles ~70 (Shloka Classification) ~56 (Shloka Classification) Gujarati NLP for Gujarati ~34 31,913 articles ~91 (News Classification) ~85 (News Classification) Kannada NLP for Kannada ~70 32,997 articles ~94 (News Classification) ~90 (News Classification) Malyalam NLP for Malyalam ~26 12,388 articles ~94 (News Classification) ~91 (News Classification) Nepali NLP for Nepali ~32 38,757 articles ~97 (News Classification) ~96 (News Classification) Odia NLP for Odia ~27 17,781 articles ~95 (News Classification) ~92 (News Classification) Marathi NLP for Marathi ~18 85,537 articles ~91 (News Classification) ~84 (News Classification) Bengali NLP for Bengali ~41 72,374 articles ~94 (News Classification) ~92 (News Classification)

GitHub - goru001/inltk: Natural Language Toolkit for Indian Languages

README.md

Natural Language Toolkit for Indian Languages (iNLTK)

Installation

Supported languages

Usage

Repositories containing models used in iNLTK

Recommend

边缘计算：在IT行业中创造新的发展

【我在加拿大割阑尾】“知道么？就在你动手术第二天，我一个同事的亲戚，住多伦多的，...

美国一位烘焙大神，将自己喜欢的动漫和电影统统做成了水果派~超精美不说，看着也挺好...

大二，意外怀孕了，男朋友说要见家长结婚休学生孩子再上学，怎么办？ - 知乎

ROSshow: ASCII art visualizations on steroids for robot sensor data

Configuring Azure Data Factory Data Flow

Creating a Smart Assistant using Twilio Voice, Amazon Alexa, and Laravel

Hyperledger Fabric 区块链开发详解

如何使用模块化的Hostintel收集恶意主机的情报信息

Writing A Very Tiny Chess Program

About Joyk