In the past I had a task, run hundreds thousands of regular experssions
on millions text documents, and it should run fast (run time should
be measured by seconds).
Perl regexp alternation may be good option for that, because it put all
regexps on one trie structure and passes them at once, but I needed more
than simple match. I had to know what regexp was matched and exact position
of match, and much more. Trying to optimize matching process, I found that
my options to interact with regexp engine during matching process are very limited.
I decide to write my own regexp engine, I thought it shouldn't be too
complicated (when I found I'm wrong it was too late :-) ).
Basic idea is that every matching event will have it's own handler and
will run immediately as this event happens. Matching handler will get
relevant information from regexp engine and send back orders how to continue
matching process.
I created
Regexp::SAR
(Simple API for Regexp) module. Most code written in C/XS.