3

Benchmarking timing performance Keyword Replace between regex and flashtext

 11 months ago
source link: https://gist.github.com/vi3k6i5/dc3335ee46ab9f650b19885e8ade6c7a
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Benchmarking timing performance Keyword Replace between regex and flashtext · GitHub

It doesn't get you in to "comparable to FlashText" territory, but an easy regex optimisation to apply is to sort the keys before adding to the regex.

$ python flashtext_regex_timing_keyword_replace.py
Count  | FlashText | Regex | Sorted Regex
------------------------------------------
('1     ', '|', '0.01020  ', '|', '0.00007  ', '|', '0.00006  ', '|')
('1001  ', '|', '0.01195  ', '|', '0.15300  ', '|', '0.14172  ', '|')
('2001  ', '|', '0.01223  ', '|', '0.30786  ', '|', '0.28629  ', '|')
('3001  ', '|', '0.01370  ', '|', '0.43211  ', '|', '0.39781  ', '|')
('4001  ', '|', '0.01226  ', '|', '0.57421  ', '|', '0.53352  ', '|')
('5001  ', '|', '0.01331  ', '|', '0.66974  ', '|', '0.67607  ', '|')
('6001  ', '|', '0.01246  ', '|', '0.81990  ', '|', '0.70473  ', '|')
('7001  ', '|', '0.01144  ', '|', '0.92889  ', '|', '0.89696  ', '|')
('8001  ', '|', '0.01229  ', '|', '1.02987  ', '|', '0.92075  ', '|')
('9001  ', '|', '0.01200  ', '|', '1.14402  ', '|', '1.08537  ', '|')
('10001 ', '|', '0.01359  ', '|', '1.23232  ', '|', '1.07877  ', '|')
('11001 ', '|', '0.01344  ', '|', '1.38192  ', '|', '1.27324  ', '|')
('12001 ', '|', '0.01369  ', '|', '1.47766  ', '|', '1.31015  ', '|')
('13001 ', '|', '0.01565  ', '|', '1.46536  ', '|', '1.39552  ', '|')
('14001 ', '|', '0.01474  ', '|', '1.63738  ', '|', '1.54096  ', '|')
('15001 ', '|', '0.01419  ', '|', '1.75537  ', '|', '1.68527  ', '|')
('16001 ', '|', '0.01576  ', '|', '1.85689  ', '|', '1.68421  ', '|')
('17001 ', '|', '0.01501  ', '|', '1.90390  ', '|', '1.77516  ', '|')
('18001 ', '|', '0.01641  ', '|', '1.96707  ', '|', '1.77640  ', '|')
('19001 ', '|', '0.01499  ', '|', '2.06039  ', '|', '1.73332  ', '|')
('20001 ', '|', '0.01688  ', '|', '2.04545  ', '|', '1.92251  ', '|')

The speed improvement is quite varied depending on the keys chosen, but it's almost always a safe thing to do.

 sorted_compiled_re = re.compile("|".join(sorted(rep.keys())))

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK