The details of regular expression compilation vary by implementation. For example, compilation in Python or re2 simply creates an instance of a regular expression object. This object's state machine may be modeled as a graph or virtual machine. Without compilation (example: RE.match(expression, input)
), a new regular expression object is created behind the scenes each time match
is called. This is needless work if you're going to use an expression more than once.
In C#, one of three things can happen when you compile:
- A regular expression object is created (implemented as a virtual machine) similar to Python and re2.
- A regular expression object is created and its virtual machine opcodes are compiled to in-memory IL instructions on-the-fly.
- A regular expression object is created and its virtual machine opcodes are compiled to disk as IL instructions.
You mention an interest in algorithms. Take a look at Russ Cox's excellent articles for two approaches: