2

[2206.01398] A closer look at TDFA

 8 months ago
source link: https://arxiv.org/abs/2206.01398
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Computer Science > Formal Languages and Automata Theory

[Submitted on 3 Jun 2022]

A closer look at TDFA

Download PDF

We present an algorithm for regular expression parsing and submatch extraction based on tagged deterministic finite automata. The algorithm works with different disambiguation policies. We give detailed pseudocode for the algorithm, covering important practical optimizations. All transformations from a regular expression to an optimized automaton are explained on a step-by-step example. We consider both ahead-of-time and just-in-time determinization and describe variants of the algorithm suited to each setting. We provide benchmarks showing that the algorithm is very fast in practice. Our research is based on two independent implementations: an open-source lexer generator RE2C and an experimental Java library.

Comments: 26 pages, 11 figures
Subjects: Formal Languages and Automata Theory (cs.FL); Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2206.01398 [cs.FL]
  (or arXiv:2206.01398v1 [cs.FL] for this version)
  https://doi.org/10.48550/arXiv.2206.01398

Submission history

From: Ulya Trafimovich [view email]
[v1] Fri, 3 Jun 2022 05:26:57 UTC (572 KB)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK