GitHub - LingDong-/rrpl: Describing Chinese Characters with Recursive Radical Pa...
source link: https://github.com/LingDong-/rrpl
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
Recursive Radical Packing Language
Recursive Radical Packing Language (RRPL) is a proposal for a method of describing arbitrary Chinese characters concisely while retaining their structural information. Potential fields for usage include font design and machine learning. In RRPL, each Chinese character is described as a short string of numbers, symbols, and references to other characters. Its syntax is inspired by markup languages such as LaTeX, as well as the traditional "米" grids used for calligraphy practice.
5000+ Traditional Chinese Characters and radicals are currently described using this language. You can download a .json file containing all of them (and unicode mapping) here: dist/min-trad.json
Check out Chinese character & radical visualizations made with RRPL here and here.
Syntax
Each Chinese character is described as a combination of components. These components can be other characters or radicals, as well as building blocks, which defines the simplest shapes that make up every component. Combination can be applied recursively to describe ever more complex glyphs.
Below is an overview of this syntax; You can also check out the Interactive Demo to play with it yourself.
Building Blocks
A building block is a string of the alphabet {0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
}, in which the presence of a number indicates a corresponding stroke to be drawn on a "米" grid:
1 2 3
\|/
8 -+- 4
/|\
7 6 5
0
indicates that no stroke should be drawn in this block.
Example:
ResultCodeResultCode48
24578
Packing
Building blocks can be packed horizontally or vertically using the -
and |
symbols respectively to compose more complex glyphs. These symbols can be chained to pack more than two symbols with equal room.
Example:
ResultCodeResultCode27-26-26
2468|24578
Grouping
(
and )
symbols can be used to group components together so mixed horizontal and vertical packing can happen in the correct order.
Example:
ResultCodeResultCode(48|37)-(25678|27)-(37|15)
(46-68)|(246-268)|(24-28)
Referencing
Other characters and radicals can be referenced directly to build a new character. The parser will dump the contents of the reference glyph directly into the string, similar to C/C++ #include
feature. This makes it especially easy to describe the more complicated Chinese characters, as most of them consist of radicals.
Example:
ResultCodeResultCode廿|468|由|(八)
((車|(山))-(殳))|(手)
((口)-(口))|(甲)|十
(((木)-(缶)-(木))|(冖))|((鬯)-(彡))
Parser
An baseline parser is included in rrpl_parser.js
, which powers this Interactive Demo. It can be used with browser-side JavaScript as well as Node.js:
//require the module: (or in html, <script src="./rrpl_parser.js"></script>) var parser = require('./rrpl_parser.js'); //obtain an abstract syntax tree var ast = parser.parse("(48|37)-(25678|27)-(37|15)"); //returns line segments (normalized 0.0-1.0) that can be used to render the character var lines = parser.toLines(parser.toRects(ast));
File Type
RRPL data can be stored in a JSON file, whith the root object mapping unicode characters to their respective description, e.g.
{ "一":"48", "丁":"468|26|27", "上":"246|248", "不":"(48-45678-48)|(3-26-1)", "丕":"不|一", "中":"(46-2468-68)|(24-2468-28)", "串":"中|中" }
The references in these files are usually first expanded before rendering is attempted. This can be done in two ways. The first is using parser.preprocess(json_object)
in rrpl_parser.js
, while the second is using compile.js
. More documentation can be found in the header comments of these files.
The JSON files can be further compressed into (and uncompressed from) a binary file around half of the size of the original using compress.js
, by using a half byte to encode each symbol in the RRPL alphabet.
Downloads
- dist/min-trad.json contains RRPL description of 5000+ traditional Chinese characters stored in JSON format.
- dist/RRPL.ttf contains a True Type Font (ttf) containing 5000+ traditional Chinese characters with glyphs generated by the default parser. Below is a screenshot of the font in macOS TextEdit.app:
Tools
Rendering
- Generate a
preview.html
web page containing a rendering of all characters in a RRPL json file:
$node render.js preview path/to/input.json
- Generate a
realtime.html
web page where user inputs can be parsed and rendered interactively: (Characters defined in the input file will be available for referencing)
$node render.js realtime path/to/input.json
Exporting
- Export a folder of SVG (Scalable Vector Graphics) rendering of each character in a RRPL json file:
$node export_glyphs.js path/to/input.json path/to/output/folder 0
Contrary to what render.js
generates, these SVG's contains "outlines" of the glyphs instead of simple strokes. More settings such as thickness can be tweaked in the source code of export_glyphs.js
; Command-line API will come later.
- To generate a TTF (True Type) font from the aforementioned SVG's, FontForge's python library can be used for this purpose. (
pip install fontforge
) An example can be found intools/forge_font.py
.
Applications
Since RRPL reduces all Chinese characters to a short string of numbers, their structure can be learned by sequential models such as Markov chains, RNN's and LSTM's without much difficulty. I've applied RNN (Recurrent Neural Networks) to the language to hallucinate non-existent Chinese characters. Below are some characters generated by training overnight on ~1000 RRPL character descriptions, with the visuals rendered using a pix2pix model. A separate repo for that project will be created soon.
Contributing
rrpl.json
contains the latest, work-in-progress version. There're some 5,000 characters in there, but there're over 50,000 Chinese characters in existence! So help is very much appreciated. If you'd like to help with this project, please append new characters to the file and submit a pull request. For more info, contact me by sending an email to lingdonh[at]andrew[dot]cmu[dot]edu.
Below is a rendering of all 5000+ Chinese characters denoted using RRPL so far. Click on the image to enlarge.
Recommend
-
39
README.md HTTP/3 explained is a collaborative effort to document the HTTP/3 and the QUIC protocols. Join in and help! Get the Web, PDF,...
-
33
Summary NullPointerExceptions are freqently encountered developing or maintaining a Java application. NullPointerExceptions often don't contain a message. This complicates finding the cause of the exception.
-
44
README.md TypeScript Vue Starter This quick start guide will teach you how to get TypeScript and
-
29
filesize.js filesize.js provides a simple way to get a human readable file size string from a number (float or integer) or string. Optional settings filesize() accepts an optional descriptor Object as...
-
25
README.md
-
16
README.md COPE - 格律诗编辑程序 Chinese Old Poem Editor - A modern IDE fo...
-
3
Describing Software Engineers (SWEs)On Titles, Levels, Salaries and Effective ExperienceWhat’s the difference between a “Software Engineer” and a “Senior Software Engineer”? Is a Senior SWE in o...
-
3
Files Permalink Latest commit message Commit time
-
2
Conversation"google has a secret deal with facebook called "Jedi Blue" that they knew was so illegal that it has a whole section describing how they'll cover for each o...
-
4
Avoid describing your data multiple times in a Laravel app using laravel-data Original –
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK