2

[Python] Use Regular Expression to Find Strings Marked For Internationalization...

 2 years ago
source link: http://siongui.github.io/2016/01/01/python-regular-expression-to-find-i18n-strings/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Python Regular Expression

An i18n (web) application usually mark strings to be translated as _("string"). You can use xgettext in GNU gettext utilities to extract translatable strings from given input files. This post, however, will use regular expression in Python to do the work.

The basic pattern to search _("string") is:

def searchI18n(string):
  # only first match and longest match
  # i.e., the string {{_("ddd")}}12345{{_("sss")}} will return
  # {{_("ddd")}}12345{{_("sss")}}, not return {{_("ddd")}}
  return re.search(r'{{\s*_\(\s*(.+)\s*\)\s*}}', string)

A more advanced pattern is:

def getAllMatchesInFile(filepath):
  with open(filepath, 'r') as f:
    # [^)] to prevent {{_("ddd")}}12345{{_("sss")}}
    return re.findall(r'{{\s*_\(\s*([^)]+)\s*\)\s*}}', f.read())

The above function will return all matched strings in a file.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK