6

Cutlet: A Japanese to Romaji Converter in Python

 3 years ago
source link: https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A few months ago I released cutlet , a Python library and application for converting arbitrary Japanese text to romaji.

JVJRZrU.png!web Katsu curry illustrated by Irasutoya

Compared to other libraries cutlet has several advantages:

  • it uses fugashi , so you can re-use your existing dictionary
  • words of foreign origin optionally use their original spelling ("cutlet" instead of "katsu") thanks to Unidic
  • it's easy to add exceptions for specific words
  • a built-in slug mode for URL generation

The foreign spelling feature in particular is something I've never seen in another system, and in some cases is important for getting things right. For example, "Sweden Hills" is a neighborhood in Hokkaido, but even the Post Office data gives the romaji as the odd-looking "Suedenhiruzu". With cutlet the output would be "Sweden hill" and it's easy to add an exception if you want "Hills".

Here's an example of usage from Python:

from cutlet import Cutlet
katsu = Cutlet()
katsu.romaji("カツカレー")
# => 'Cutlet curry'
katsu.slug("カツカレー")
# => 'cutlet-curry'

One of my main motivations for making this library was dealing with the frequent case where using Japanese text isn't an option for technical reasons, or it is an option but comes with downsides. A common example is urls - while you can use Japanese text in URLs, in many situations the text becomes unreadable hex escapes, so it's not actually helpful for anyone. Generating an article slug in romaji creates something that can still be interpreted in Japanese and is free from any technical compatability worries.

cutlet is available for install via pip, and works on the command line as well as via Python. If you make use of it I'd love to hear about it. If there's a feature you'd like it to include feel free to open an issue . While I don't have any more major features planned, I would like to make a web version you can use to try it out; I'll post about that on Twitter if I ever get it set up. Ψ


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK