0
votes

i'm trying to implement a time parser with LEX & YACC. I'm a complete newbie to those tools and C programming.

The program has to print a message (Valid time format 1: input ) when one of those formats is entered: 4pm, 7:38pm, 23:42, 3:16, 3:16am, otherwise a "Invalid character" message is printed.

lex file time.l :

%{
#include <stdio.h>
#include "y.tab.h"
%}

%%

[0-9]+                {yylval=atoi(yytext); return digit;}
"am"                   { return am;}
"pm"                   { return pm;}
[ \t\n]               ;
[:]                    { return colon;}
.                     { printf ("Invalid character\n");}

%%

yacc file time.y:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>

%}

%start time
%token digit
%token am
%token pm
%token colon

%%

time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}
            |  hour colon minute   {printf ("Valid time format 2 : %s:%s\n",$1, $3);}
            |  hour colon minute ampm {printf ("Valid time format 3 : %s:%s%s\n",$1, $3, $4); }
            ;

ampm        :   am               {$$ = "am";}
            |   pm               {$$ = "pm";}
            ;

hour        :   digit digit             {$$ = $1 * 10 + $2;}
            |   digit             { $$ = $1;}
            ;

minute      :   digit digit         {$$ =  $1 * 10 + $2;} 
            ;

%%
int yywrap()
{
        return 1;
} 

int main (void) {

  return yyparse();
}

void yyerror (char *s) {fprintf (stderr, "%s\n", s);}

compiling with this command:

yacc -d time.y && lex time.l && cc lex.yy.c y.tab.c -o time

I'm getting some warnings:

time.y:17:47: warning: format specifies type 'char *' but the argument has type
      'YYSTYPE' (aka 'int') [-Wformat]
    {printf ("Valid time format 1 : %s%s\n ", (yyvsp[(1) - (2)]), (yyvsp.

This warning appears for all the variables in printf statements. The values are all char, because even the number in the time string is converted with the atoi function.

Executing the program with a valid input throws this error:

./time

1pm

[1]    2141 segmentation fault  ./time

Can someone help me? Thanks in advance.

5

5 Answers

3
votes

This (f)lex rule:

[0-9]+                {yylval=atoi(yytext); return digit;}

recognizes any integer, not just a digit. (It allows leading zeros, which is probably appropriate for a date parser.) It assumes that yylval is an int, which is the case if you don't do something to declare the type of yylval.

Meanwhile, this (f)lex rule:

"am"                 { return am;}

recognizes the token am, but does not set the value of yylval.

Now, in your bison file, you have:

hour        :   digit digit       { $$ = $1 * 10 + $2; }
            |   digit             { $$ = $1;}
            ;

Since digit actually represents an entire integer, the digit digit production is incorrect. It would recognize, for example, the input 23 75 (since your flex file ignores whitespace), but it would turn that into the value 305 (10*23 + 75). That hardly seems appropriate. Again, it assumes that the type of the semantic values $$ and $1 is int, which is the default case.

However, the production:

ampm        :   am               {$$ = "am";}
            |   pm               {$$ = "pm";}
            ;

requires that the type of the result semantic value be char * (or even const char*). Since you have not done anything to declare the type of semantic values, their type is int and the assignment is just as invalid as would be the C statement:

int ampm = "am";

So the C compiler issues an error message.

Furthermore, in your production:

time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}

you assume that the semantic values $1 and $2 are strings (char*). BUt the values are actually integers, so printf will do something undefined and probably disastrous (in this case, segfault). (Because of the nature of C this is not a compile-time error, but most C compilers will issue a warning. Apparently, your C compiler does so.)

How this should be fixed depends on your interpretation of the assignment. When it says "print a message (Valid time format 1: input )", does it mean that the literal input string should be printed, or is it ok to print an interpretation of the string? That is, given actual inputs

8:23am
08:23am

Would you want the messages to be

Valid time format 1: 8:23am
Valid time format 1: 08:23am

Or is it appropriate to normalize:

Valid time format 1: 8:23am
Valid time format 1: 8:23am

You should (re-)read the section in the bison manual on semantic types, and then decide whether you want the type to be int, char*, or a union of the two.

Some other things you need to think about:

  1. Your flex file recognizes any integer, but neither hours nor minutes can be arbitrary integers. Both are limited to two digits; normally, the minutes should always be two digits (so that 9:3am is not a way of writing 9:03am). They both have limited ranges of valid values; minutes must be between 00 and 59, while hours is between 1 and 12 if am or pm is specified, and otherwise between 0 and 23. Or perhaps 24. (Actually, there are lots of different possible validity conventions for hours; you might choose to be flexible or strict.)

  2. Your problem description doesn't appear to allow spaces in the time specifications, but your flex file ignores whitespace. So that might lead you to recognize incorrect inputs (depending, again, on how strict you wish to be). Also see the note about output in this case: does the whitespace appear in the output (assuming it is acceptable)?

  3. Your flex file issues an error message when it sees a character it doesn't recognize, but it does not stop lexing. In effect, that means that illegal characters will be dropped from the input stream, so that an input like:

    1;:17rpm
    

    will result in two illegal character messages followed by a message saying that the input was a valid 1:17pm. That is unlikely to be what you wanted.

As a final note, I have to say that in my opinion, understanding C is an absolute prerequisite to using flex and bison. Trying to teach all three at the same time strikes me as pedagogically suspect.

0
votes

The error message

time.y:17:47: warning: format specifies type 'char *' but the argument has type
  'YYSTYPE' (aka 'int') [-Wformat]

for that line for example

printf ("Valid time format 1 : %s%s\n ", $1, $2);

says that you specified an %s (which is a C-style string of type char *) but effectively the argument is of type YYSTYPE (which seems to be an integer type).

0
votes

As @Elyasin pointed out, the error message you're given is telling you exactly what's wrong- YYSTYPE defaults to being an int but you're attempting to use it as a string (this is on every line you get the error). Furthermore you're attempting to use it as an int in some places and a string in others, which is obviously incorrect.

What you can do is create a string to hold your input as you go and concatenate into that. You can do this with a variable in your initial yacc block, so something like this:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>

char time_str[15];
%}

time_str is now available throughout your parser steps, so you can copy into that, then in your final step you can just print out the built up string, like

printf ("Valid time format 1 : %s", timestr);
0
votes

I've solved the warnings defining a char array for am and pm values and treated as int the YYSTYPE variables (as suggested).

I've also added cases for empty lines, comma separation after each input, validation for hours and minutes, exit command:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char ampm_str[15] = "";

typedef int bool;
bool validFormat = 1;
%}

%start input
%token digit
%token am
%token pm
%token colon
%token sep
%token exit_command

%%
input       : /* empty */
            | input line 
            ;

line        : '\n'
            | list '\n' 
            ;

list        : time
            | time sep list 
            | exit_command  {exit(EXIT_SUCCESS);}
            ;


time        :  hour ampm                {if ($1 > 12 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf("Valid time format %d%s\n", $1, ampm_str); } validFormat = 1;}
            |  hour colon minute        {if ($1 > 24 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf("Valid time format   %d:%d\n", $1, $3); } validFormat = 1;}
            |  hour colon minute ampm   {if ($1 > 12 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf ("Valid time format   %d:%d%s\n", $1, $3, ampm_str); } validFormat = 1;}
            ;


hour        :   two_digits        { $$ = $1; }
            |   digit             { $$ = $1; }
            ;

minute      :   two_digits          { $$ = $1; if ($$ > 59) {printf ( "minute out of range\n");validFormat = 0;}}
            |   digit               { $$ = $1; if ($$ > 59) {printf ( "minute out of range\n");validFormat = 0;}}
            ;

two_digits  :  digit digit          {$$ = 0; $$ = $1 * 10 + $2; }
            ;

ampm        :   am               {strcpy(ampm_str, "am");}
            |   pm               {strcpy(ampm_str, "pm");}
            ;


%%
int yywrap()
{
        return 1;
} 

int main (void) {
printf ("Insert time, and press enter\n");
printf ("Type , after each time\n");
printf ("Valid formats : 2am, 12:00, 13:30pm\n");
printf ("exit to quit\n");

  return yyparse();
}


void yyerror (char *s) {fprintf (stderr, "Invalid character: %s\n", s); validFormat = 0;}
0
votes

For byte parsing in lex file 0x[0-9a-f]{8} { yylval.number = strtoll(yytext+2, NULL, 16); return BYTE_4; } In yacc file You need to declear this number as part of union.