I have no practical experience in lex/yacc so my question may look naive but I could not figure out a reasonable solution with all information I found in the stackoverflow and internet. Suppose I need a parser for C/C++-like syntax but everything I need are function-call-like statements like foo(a), bar(1, 2), foobar("x", a, (b, c)), etc. I am not interested in validity of the code and expressions; I am ready to consider them as just a text/sequence of symbols. I need to recognize string literals, identifiers and expressions between commas and parentheses (just a sequence of symbols). Well, I need to drop comments and to recognize preprocessor directives this is outside the question scope.
I am not familiar enough with lex/yacc, but I am a software engineer with, let's say, some experience. In the past, I wrote such a parser in C++ without any 3rd-party helpers/tools. Not in five minutes, but I wouldn't say it was a big deal. Nevertheless, it's a piece of code to manage. So next time I need it, I thought that using lex/yacc can be a good idea. Definitely, a solution for such a primitive task should be even more primitive with tools specialized for grammars. Apparently, I spent more time (unsuccessfully) trying to get something our from lex/yacc than I would need to write the parser completely manually.
Let's say my lex produces identifiers, string_literals, ',', '(', ')' and symbols (all the rest). It removes comments and preprocessor stuff. So I would like to say in yacc something like
expression_element
: IDENTIFIER
| STRING_LITERAL
| SYMBOL
| list_expression
;
list_expression
: '(' ')'
| '(' expression ')'
| '(' expression_list ')'
;
expression_list
: expression ',' expression
| expression_list ',' expression
;
expression
: expression_element
| IDENTIFIER list_expression
{ /* And this is what I really need. */ }
| expression expression_element
;
Well, I believe this could be written i another way and maybe even simpler. As I said, I do not mind validity of expressions, I mind only their integrity regarding ',', '(' and ')', and simplicity. Now, the real problem I cannot resolve is: whatever I do, I cannot force them to distinguish between 'IDENTIFIER list_expression' (taking precedence) and 'IDENTIFIER NOT-list-expression' where the first is the function-call-like statement I need and the second is just IDENTIFIER on its own as a part of anything else (statement).Anything I tried leads to conflicts and following parsing errors only.
Is there anything simple I miss? Or I need to create a gory grammar for such a small staff? Or I just need another tool (recommendations?)? I would prefer to avoid writing parser by myself unless this is an only simple solution...