I am curious about parsing C++ code using regexp. What I have so far (using ruby) allows me to extract class declarations and their parent classes (if any):
/(struct|class)\s+([^{:\s]+)\s*[:]?([^{]+)\s*\{/
Here is an example in Rubular. Notice I can capture correctly the "declaration" and "inheritance" parts.
The point at where I am stuck is at capturing the class body. If I use the following extension of the original regex:
/(struct|class)\s+([^{:\s]+)\s*[:]?([^{]+)\s*\{[^}]*\};/
Then I can capture the class body only if it does not contain any curly braces, and therefore any class or function definition. At this point I have tried many things but none of them make this better. For instance, if I include in the regexp the fact that the body can contain braces, it will capture the first class declaration and then all the subsequent classes as if they were part of the first class' body!
What am I missing?