0
votes

I am using Flex and Bison for analyzing JSON. This is how my Flex looks like:

%%

[ \n\t]+    
true        { return VAL_TRUE; }
false       { return VAL_FALSE; }
null        { return VAL_NULL; }
{STRING}    { yylval->string = strdup(yytext); return STRING; }
{NUMBER}    { yyval->number = atof(yytext); return NUMBER; }
\{          { return OBJ_BEG; }
\}          { return OBJ_END; }
:           { return SYM_COLON; }
,           { return SYM_COMMA; }

%%

And i have a grammar like this in Bison:

%%

START:      OBJECT                      { printf("%s\n", $1); }
    ;

OBJECT:     OBJ_BEG OBJ_END             { $$ = "{}\n"; }
    |       OBJ_BEG MEMBERS OBJ_END     { 
                                            $$ = ALLOC(2+strlen($2)+2); 
                                            sprintf($$,"{ %s }",$2); 
                                        }
    ;

MEMBERS:    PAIR                        { $$ = $1; }
    |       PAIR SYM_COMMA MEMBERS      { 
                                            $$ = ALLOC(strlen($1)+2+strlen($3)); 
                                            sprintf($$,"%s, %s",$1,$3); 
                                        }
    ;

PAIR:       STRING SYM_COLON VALUE      { 
                                            $$ = ALLOC(strlen($1)+2+strlen($3)); 
                                            sprintf($$,"%s: %s",$1,$3); 
                                        }
    ;

...

VALUE:      STRING                      { $$ = yylval.string; }
    |       NUMBER                      { $$ = yylval.number; }
    |       OBJECT                      { $$ = $1; }
    |       ARRAY                       { $$ = $1; }
    |       VAL_TRUE                    { $$ = "true"; }
    |       VAL_FALSE                   { $$ = "false"; }
    |       VAL_NULL                    { $$ = "null"; }
    ;

%%

Using all this I'm trying to identify JSON. I'm also formatting input by adding some commas, parentheses and spaces.

But what I got stuck with is how do I save all the linebreaks "\n" and tabulations "\n" that i have in input JSON and send them directly to output? Now I ignore them in Flex's "[ 'n\t]+" and then add spaces manually in some Bison's actions.

This is the approach I'm thinking on:

I can identify "\n"s and "\t"s in Flex and forward them to Bison as SYM_LINEBREAK or SYM_TAB. But how do I add them to output in Bison's actions and where do I put these rules/actions?

Briefly what i need to do: add some spaces, linebreaks and tabulations to output and save linbreaks and tabulations (but not spaces) that were in input file.

Thanks in advance!

1

1 Answers

0
votes

Whitespace is not relevant to JSON syntax -- any number of extraneous newlines or tabs may appear between any two tokens in the input, and it doesn't affect the meaning of the json (as long as it doesn't break up a token). So for recognizing and parsing json, you want to ignore it like you are doing.

So the question is, what do you want to do? If you just want to copy the input to the output with no change, you don't need a parser at all. If you want to remove all the spurious whitespace, you can do what you are doing, but that is overkill. If you want to build a tree of objects from the json (not what you are doing currently), then you need a parser like you have, but you should create a tree of objects rather than strings.