GitHub - thautwarm/Fable.Sedlex: get you an ultimate lexer generator using Fable... - JOYK Joy of Geek, Geek News, Link all geek

NOTE: currently we support interpreted mode and Python source code generation.

It's EASY to compile compiled_unit into source code for C#, F# and others. See CodeGen.Python.fs for how to write a custom backend.

Fable.Sedlex

Thanks to the Fable.Python compiler, this project managed to modify code from the amazing OCaml sedlex, implementing an ultimate lexer generator for (currently) both Python and F#.

The most impressive feature of sedlex is that Sedlex statically analyses regular expressions built with lexer combinators, correctly ordering/merging/optimizing the lexical rules. In short, sedlex is correct and efficient.

For Python users: the generated Python code is located in fable_sedlex directory, and you can directly copy them to your Python package to access such lexer generator.

For .NET(C#, F#) users: copying Sedlex.fs will give you such lexer generator in .NET.

For users in other programming languages: you might access compiled_unit and write a code generator.

from fable_sedlex.sedlex import *
from fable_sedlex.code_gen_python import codegen_python
from fable_sedlex.code_gen import show_doc

digit = pinterval(ord('0'), ord('9'))

dquote = pchar('"')
backslash = pchar('\\')
dot = pchar('.')

exp = pseq([por(pchar('E'), pchar('e')), pplus(digit)])
flt = pseq([pstar(digit), dot, pplus(digit), popt(exp)])
integral = pseq([pplus(digit), popt(exp)])
string_lit = pseq([dquote, pstar(por(pcompl(dquote), pseq([backslash,  pany]))), dquote])
space = pchars(["\n", "\t", "\r", ' '])

EOF_ID = 0
cu = build(
    [
        (space, Lexer_discard),
        (flt, Lexer_tokenize(1)),
        (integral, Lexer_tokenize(2)),
        (string_lit, Lexer_tokenize(3)),
        (pstring("+"), Lexer_tokenize(4)),
        (pstring("+="), Lexer_tokenize(4)),
        (peof, Lexer_tokenize(EOF_ID))
    ], "my error")

@dataclass
class MyToken:
  token_id: int
  lexeme : str
  line: int
  col: int
  span: int
  offset: int
  file: str

f = inline_thread(cu, lambda args: MyToken(*args))
# we can also generate Python source code for equivalent lexer via:
#    `code = show_doc(codegen_python(cu))`
# see `test_codegen_py.py`, `generated.py` and `test_run_generated.py` 
# for more details


buf = from_ustring(r'123 2345 + += 2.34E5 "sada\"sa" ')

tokens = []
while True:
    try:
        x = f(buf)
    except Exception as e:
        print(e)
        raise
    
    if x is None:
        continue
    if x.token_id == EOF_ID:
        break
    tokens.append(x)

print(tokens)
> python test.py

[MyToken(token_id=2, lexeme='123', line=0, col=3, span=3, offset=0, file=''), MyToken(token_id=2, lexeme='2345', line=0, col=8, span=4, offset=4, file=''), MyToken(token_id=4, lexeme='+', line=0, col=10, span=1, offset=9, file=''), MyToken(token_id=4, lexeme='+=', line=0, col=13, span=2, offset=11, file=''), MyToken(token_id=1, lexeme='2.34E5', line=0, col=20, span=6, offset=14, file=''), MyToken(token_id=3, lexeme='"sada\\"sa"', line=0, col=31, span=10, offset=21, file='')]

GitHub - thautwarm/Fable.Sedlex: get you an ultimate lexer generator using Fable...

Fable.Sedlex

Recommend

Let's write async-compatible read/write lock

There is No Magic - Computation Expressions

Welcome to 2022!

邁向極致的黃金王牌，MSI MEG Z690 Ace 動手玩

My favorite F# code I've written

NVIDIA 將推出 GeForce RTX 3080 / 3070 Ti 行動顯示晶片、GeForce RTX 3050 顯示卡

Morgemil Game Update #6: Adding time and characters

Ezio

[Go] context.Context 解析

猶如職人般的精神，Pioneer BDR-S13 系列藍光燒錄機上市銷售

About Joyk