0
votes

Having an issue with my flex / bison grammar. Not sure if it is the way that I have set up the recursion that is shooting myself in the foot.

When trying to access the data passed via yylval I would use the $1... variable for each element of the production. However when doing this it is not splitting the values into each token. It prints the whole production. This only happens with the second sentence in the metadata production, the first seems to be OK.

I was intending to create a check(int token_val) function that contains a switch(token_val) and checks the return value for each token and then acts on its yytext appropriately. Is there a way to use the $ variable notation that will give me the return value from the commands production? Or is this the incorrect way to go about things?

I have checked the references for this but maybe I have missed something, would appreciate someone to clarify.

Code: bison



input: input metadata
     | metadata
     ;

metadata: command op data {printf("%s is valid.\n", $3);} // check_data($1) ?
        | data op data op data op data {printf("row data is valid\n\t %s\n", $1);}
        ;

command: PROD_TITL
      |  _DIR
      |  DOP
      |  DIT
      |  FORMAT
      |  CAMERA
      |  CODEC
      |  DATE
      |  REEL
      |  SCENE
      |  SLATE
      ;

op: EQUALS
  | COLON
  | SEP
  ;

data: META
    | REEL_ID
    | SCENE_ID
    | SLATE_ID
    | TAKE
    | MULTI_T
    | LENS
    | STOP
    | FILTERS
    ;

%%

int main(void) {
  return yyparse();
}


lex:

%{
#include <stdio.h>
#include <string.h>
#include "ca_mu.tab.h"
%}

%option yylineno

%%


\"[^"\n]*["\n]              {yylval = yytext; return META;}
[aA-aZ][0-9]+               {yylval = yytext; return REEL_ID;}
([0-9aA-zZ]*\/[0-9aA-zZ]*)  {yylval = yytext; return SCENE_ID;}
[0-9]+                      {yylval = yytext; return SLATE_ID;}
[0-9][aA-zZ]+               {yylval = yytext; return TAKE;}
[0-9]+-[0-9]+               {yylval = yytext; return MULTI_T;}
[0-9]+MM                    {yylval = yytext; return LENS;}
T[0-9]\.[0-9]+              {yylval = yytext; return STOP;}
"{"([^}]*)"}"               {yylval = yytext; return FILTERS;}

Output sample:

"My Production" is valid.
"Dir Name" is valid.
"DOP Name" is valid.
"DIT Name" is valid.
"16:9" is valid.
"Arri Alexa" is valid.
"ProRes" is valid.
"02/12/2020" is valid.
A001 is valid.
23/22a is valid.
001 is valid.
row data is valid
         1, 50MM, T1.8, { ND.3 }  // $1 prints all tokens?
row data is valid
         3AFS,   50MM, T1.8, {ND.3}

input

/* This is a comment */

production_title = "My Production"
director         = "Dir Name"
DOP              = "DOP Name"
DIT              = "DIT Name"
format           = "16:9"
camera           = "Arri Alexa"
codec            = "ProRes"
date             = "02/12/2020"

reel: A001
  scene: 23/22a
    slate: 001
      1, 50MM, T1.8, { ND.3 }
      3AFS,   50MM, T1.8, {ND.3}
    slate: 002
      1,  65MM, T1.8, {ND.3 BPM1/2}
    slate: 003
      1-3, 24MM, T1.9, {ND.3}

END

2
Could you also provide the input that gave these outputs?Paul Ogilvie
Maybe I need to use a union? I thought as I only needed to represent the data as text I could just set the %define api.value.type {char *} globally.hdcdigi
In the data: production, you could add print statements to see what was matched. It seems that META is always matched/Paul Ogilvie
input justadded as an edithdcdigi

2 Answers

2
votes

The problem is here, in your scanner actions:

yylval = yytext;

You must never do this.

yytext points into a temporary buffer which is only valid until the next call to yylex(), and that means you are effectively making yylval a dangling pointer. Always copy the string, as with:

yylval = strdup(yytext);

(Don't forget to call free() on the copied strings when you no longer need the copies.)

1
votes

I think your language is too simple and doesn't define the structure of the input. For example:

reel: A001                     // repetition of reels, consisting of....
  scene: 23/22a                // repetition of scenes, consisting of...
    slate: 001                 // repetition of slates, consisting of...
      1, 50MM, T1.8, { ND.3 }  // repetition of slate data

This is a structure, so the input is:

movie: metadata reels
     ;

metadata: /* see your stuff */ ;

reels: reel
     | reels reel
     ;

reel: REEL REEL_ID scenes
    ;

scenes: scene
     | scenes scene
     ;

scene: SCENE SCENE_ID slates
     ;

slates: slate
      | slates slate
      ;

slate: SLATE SLATE_ID slate_datas
     ;

slate_datas: slate_data
    | slate_datas slate_data
    ;

slate_data: /*your stuff*/ ;