GitHub - thautwarm/Fable.Sedlex: get you an ultimate lexer generator using Fable...

 8 months ago
source link: https://github.com/thautwarm/Fable.Sedlex
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

NOTE: currently we support interpreted mode and Python source code generation.

It's EASY to compile compiled_unit into source code for C#, F# and others. See CodeGen.Python.fs for how to write a custom backend.


Thanks to the Fable.Python compiler, this project managed to modify code from the amazing OCaml sedlex, implementing an ultimate lexer generator for (currently) both Python and F#.

The most impressive feature of sedlex is that Sedlex statically analyses regular expressions built with lexer combinators, correctly ordering/merging/optimizing the lexical rules. In short, sedlex is correct and efficient.

For Python users: the generated Python code is located in fable_sedlex directory, and you can directly copy them to your Python package to access such lexer generator.

For .NET(C#, F#) users: copying Sedlex.fs will give you such lexer generator in .NET.

For users in other programming languages: you might access compiled_unit and write a code generator.

from fable_sedlex.sedlex import *
from fable_sedlex.code_gen_python import codegen_python
from fable_sedlex.code_gen import show_doc

digit = pinterval(ord('0'), ord('9'))

dquote = pchar('"')
backslash = pchar('\\')
dot = pchar('.')

exp = pseq([por(pchar('E'), pchar('e')), pplus(digit)])
flt = pseq([pstar(digit), dot, pplus(digit), popt(exp)])
integral = pseq([pplus(digit), popt(exp)])
string_lit = pseq([dquote, pstar(por(pcompl(dquote), pseq([backslash,  pany]))), dquote])
space = pchars(["\n", "\t", "\r", ' '])

EOF_ID = 0
cu = build(
        (space, Lexer_discard),
        (flt, Lexer_tokenize(1)),
        (integral, Lexer_tokenize(2)),
        (string_lit, Lexer_tokenize(3)),
        (pstring("+"), Lexer_tokenize(4)),
        (pstring("+="), Lexer_tokenize(4)),
        (peof, Lexer_tokenize(EOF_ID))
    ], "my error")

class MyToken:
  token_id: int
  lexeme : str
  line: int
  col: int
  span: int
  offset: int
  file: str

f = inline_thread(cu, lambda args: MyToken(*args))
# we can also generate Python source code for equivalent lexer via:
#    `code = show_doc(codegen_python(cu))`
# see `test_codegen_py.py`, `generated.py` and `test_run_generated.py` 
# for more details

buf = from_ustring(r'123 2345 + += 2.34E5 "sada\"sa" ')

tokens = []
while True:
        x = f(buf)
    except Exception as e:
    if x is None:
    if x.token_id == EOF_ID:

> python test.py

[MyToken(token_id=2, lexeme='123', line=0, col=3, span=3, offset=0, file=''), MyToken(token_id=2, lexeme='2345', line=0, col=8, span=4, offset=4, file=''), MyToken(token_id=4, lexeme='+', line=0, col=10, span=1, offset=9, file=''), MyToken(token_id=4, lexeme='+=', line=0, col=13, span=2, offset=11, file=''), MyToken(token_id=1, lexeme='2.34E5', line=0, col=20, span=6, offset=14, file=''), MyToken(token_id=3, lexeme='"sada\\"sa"', line=0, col=31, span=10, offset=21, file='')]

About Joyk

Aggregate valuable and interesting links.
Joyk means Joy of geeK