3
Benchmarking timing performance Keyword Replace between regex and flashtext
source link: https://gist.github.com/vi3k6i5/dc3335ee46ab9f650b19885e8ade6c7a
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Benchmarking timing performance Keyword Replace between regex and flashtext · GitHub
It doesn't get you in to "comparable to FlashText" territory, but an easy regex optimisation to apply is to sort the keys before adding to the regex.
$ python flashtext_regex_timing_keyword_replace.py
Count | FlashText | Regex | Sorted Regex
------------------------------------------
('1 ', '|', '0.01020 ', '|', '0.00007 ', '|', '0.00006 ', '|')
('1001 ', '|', '0.01195 ', '|', '0.15300 ', '|', '0.14172 ', '|')
('2001 ', '|', '0.01223 ', '|', '0.30786 ', '|', '0.28629 ', '|')
('3001 ', '|', '0.01370 ', '|', '0.43211 ', '|', '0.39781 ', '|')
('4001 ', '|', '0.01226 ', '|', '0.57421 ', '|', '0.53352 ', '|')
('5001 ', '|', '0.01331 ', '|', '0.66974 ', '|', '0.67607 ', '|')
('6001 ', '|', '0.01246 ', '|', '0.81990 ', '|', '0.70473 ', '|')
('7001 ', '|', '0.01144 ', '|', '0.92889 ', '|', '0.89696 ', '|')
('8001 ', '|', '0.01229 ', '|', '1.02987 ', '|', '0.92075 ', '|')
('9001 ', '|', '0.01200 ', '|', '1.14402 ', '|', '1.08537 ', '|')
('10001 ', '|', '0.01359 ', '|', '1.23232 ', '|', '1.07877 ', '|')
('11001 ', '|', '0.01344 ', '|', '1.38192 ', '|', '1.27324 ', '|')
('12001 ', '|', '0.01369 ', '|', '1.47766 ', '|', '1.31015 ', '|')
('13001 ', '|', '0.01565 ', '|', '1.46536 ', '|', '1.39552 ', '|')
('14001 ', '|', '0.01474 ', '|', '1.63738 ', '|', '1.54096 ', '|')
('15001 ', '|', '0.01419 ', '|', '1.75537 ', '|', '1.68527 ', '|')
('16001 ', '|', '0.01576 ', '|', '1.85689 ', '|', '1.68421 ', '|')
('17001 ', '|', '0.01501 ', '|', '1.90390 ', '|', '1.77516 ', '|')
('18001 ', '|', '0.01641 ', '|', '1.96707 ', '|', '1.77640 ', '|')
('19001 ', '|', '0.01499 ', '|', '2.06039 ', '|', '1.73332 ', '|')
('20001 ', '|', '0.01688 ', '|', '2.04545 ', '|', '1.92251 ', '|')
The speed improvement is quite varied depending on the keys chosen, but it's almost always a safe thing to do.
sorted_compiled_re = re.compile("|".join(sorted(rep.keys())))
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK