0
votes

The text to be parsed has such examples of commands relating to file system

infile abc*.txt
list abc*ff.txt

where abc*.txt is like the general wildcard argument for shell commands.

However, there is also mathematical expression like:

x=a*b

A common expression rule (in yacc file) is like:

expression: 
    expression '+' expression { $$ = $1 + $3;  }
    |   expression '-' expression { $$ = $1 - $3; }
    |   expression '*' expression { $$ = $1 * $3; }
    ;

The * is used as multiply operator.

And a rule to recognize token IDENTIFIER with * is as:

[A-Za-z][A-Za-z0-9_\.\*]*   {
    yylval.strval = strdup(yytext);  return IDENTIFIER; }

For syntax relating to file system commands like infile or list, as the one at the beginning, the following token will be taken as IDENTIFIER, and might has * as a wildcard to match filenames.

But for an expression like

x = a*b

This should be an expression, but in above lex rule, a*b will be seen as a IDENTIFIER. And it becomes assign value of an identifier a*b to x.

How can I keep the grammar rule of expression and add the wildcard filename in lex or yacc?

1
Can you say more about the overall problem. What other things are you matching in lex? Without the big picture of what the input language looks like we cannot easily answer. I can think of several solutions but need more context.Brian Tompsett - 汤莱恩
You'd probably have to use state variables to switch state when the file keywords are encountered. It will take me a while to test out a solution.Brian Tompsett - 汤莱恩

1 Answers

3
votes

In flex this can all be handled by using what are called Start Conditions and are well described in the manual, with examples similar to your requirements.

I made a small example lexer to demonstrate this working:

ws [ \t\n\r]+
%s FILENAME
%%
{ws}    ; /* skip */
<<EOF>>    ;
<INITIAL>infile      BEGIN(FILENAME); 
<INITIAL>list         BEGIN(FILENAME); 
<FILENAME>[A-Za-z][A-Za-z0-9_\.\*]*     BEGIN(INITIAL);  
"*"               return(yytext[0]);
"+"               return(yytext[0]);
"-"               return(yytext[0]);
"/"               return(yytext[0]);
[A-Za-z][A-Za-z0-9_]*              return((int)("I"));
.                 printf("Bad character %c\n",yytext[1]);

Which I can executed in debug mode to show its operation:

C:\Users\Brian>flex -d  SOwildcard.l    
C:\Users\Brian>gcc -o SOwildcard.exe lex.yy.c -lfl    
C:\Users\Brian>SOwildcard
--(end of buffer or a NUL)
a + b
--accepting rule at line 13 ("a")
--accepting rule at line 4 (" ")
--accepting rule at line 10 ("+")
--accepting rule at line 4 (" ")
--accepting rule at line 13 ("b")
--(end of buffer or a NUL)
infile a*.txt
--accepting rule at line 4 ("
")
--accepting rule at line 6 ("infile")
--accepting rule at line 4 (" ")
--accepting rule at line 8 ("a*.txt")
--(end of buffer or a NUL)
variable * identifier
--accepting rule at line 4 ("
")
--accepting rule at line 13 ("variable")
--accepting rule at line 4 (" ")
--accepting rule at line 9 ("*")
--accepting rule at line 4 (" ")
--accepting rule at line 13 ("identifier")
--(end of buffer or a NUL)  
list a*.*
--accepting rule at line 4 ("
")
--accepting rule at line 7 ("list")
--accepting rule at line 4 (" ")
--accepting rule at line 8 ("a*.*")
--(end of buffer or a NUL)
--accepting rule at line 4 ("
")
-^C

I know you asked about lex, but I only have flex. It may be similar.