LEX? Shared regular expression

Question

I am working with LEX and YACC. I have a question regarding how to define tokens, I mean I have two regular expressions which share some characters, see the example below:

SHARED         "+"|"-"|"/"|"="|"%"|"?"|"!"|"."|"$"|"_"|"~"|"&"|"^"|"<"|">"|"("|")"|","
REXP_1          {SHARED}|[a-zA-Z]|[ \t]+|[\\][\\\"]
REXP_2          {SHARED}|[a-zA-Z]|[ \t]+|"*"

Now my point is how to identify when a character from the shared regular expression correspond to REXP_1 or REXP_2 when I define the tokens in the third section of the .lex file.

I think I am misunderstanding something, I guess that the way I write the regular expression is wrong but I do not find a way to put it in a better way. Could you please give me some hints?

More over I would appreciate if someone could advice me some criteria to determine when to define a token (file.lex) or when to define a symbol in the grammar(file.y). For some symbols it is easy to figure out if it is a token or a grammar symbol but for some others I find it difficult to define where to put them.

By the way I am working with this grammar

Could't you use the C preprocessor here? Just #define SHARED to what you want it to be, then use it with the string pasting facilities. — tripleee
I'm not familiar with LEX or YACC but presumably SHARED could be replaced with [/=%?!.$_~&^<>(),+-], then its not so huge so you could just include it in each regex. — OGHaza
Thanks for your help. What I did is to define as a token a regext similar to the one @OGHaza advised me in file.lex and then I use it in the grammar itself (file.y). I am going to test it. — pafede2

Brian Tompsett - 汤莱恩 Brian Tompsett - 汤莱恩 · Accepted Answer · 2015-02-14T13:30:36

(Answered in a question edit)

The OP wrote:

Just in case someone find it interesting I am going to write out the lessons I learned. I think that the most important lesson I learnt is that common sense is a great tool to figure out what is a intern token in the .lex file and what is a suitable token to share with the .y file.

Since the term 'common sense' may be a bit ambiguous I post the following example:
  ALPHA_NUMERIC   [a-bA-B0-9]
  SQ_CHAR         {SHARED}|{ALPHA_NUMERIC}
  SINGLE_QUOTED   {SINGLE_QUOTE}{SQ_CHAR}{SQ_CHAR}*{SINGLE_QUOTE}
where ALPHA_NUMERIC is a good intern token (file.lex) but is a bad token to share in the grammar file whereas SINGLE_QUOTED may be a good token to share with the grammar(file.y). I wrote 'may be' because it is very dependent of the specific grammar we are working on, in my concrete case it is a good token to share with the YACC file.

What I did is to define as a token a regexp similar to the one @OGHaza advised me in file.lex and then I use it in the grammar itself (file.y).

LEX? Shared regular expression

1 Answers