1
votes

I have a hand-written scanner and a bison parser which can parse this sentence (made it short for the question context):

var x : integer

Bison:

%require "3.2"
%define api.pure full


%code{
#include <stdio.h>
#include <string.h>
#include "Scanner.h"
#include<iostream>
}



%code{
    int yylex(YYSTYPE *lvalp);
    #include<iostream>
    #include<string>
    Scanner scanner;
    void yyerror(const char *error);
}


%union {
int n;
double d;
char s[1000];
}

%token VAR COL ITYPE
%token IDENTIFIER
%token INTEGER
%token EOL

%type <s> type PrimitiveType IDENTIFIER
%type <s> INTEGER    

%%
program:
| program EOL
| program SimpleDeclaration {  }
;

SimpleDeclaration: VariableDeclaration
;

VariableDeclaration: VAR IDENTIFIER COL type {std::cout<<"defined variable " << $2 << " with type " << $4 << std::endl; }

type: IDENTIFIER
| PrimitiveType
;

PrimitiveType: ITYPE { strcpy($$, "int"); }
;

%%
int main()
{
    scanner.set_file("inp.txt");
    return yyparse();
}

void yyerror(const char *error)
{
    std::cout << "syntax error" << std::endl;
}

int yylex(YYSTYPE *lvalp)
{
    return scanner.get_next_token(lvalp);
}

scanner.get_next_token(lvalp) returns a token INTEGER for example (included parser.tab.hpp in scanner.cpp and making use of the generated enums from the tokens). Also, before that it puts the correct value in lvalp such as strcpy(lvalp->s, nextTokenString.c_str()) and lvalp->n = toInt(nextTokenString) and so on.... The output is:

defined variable x with type int

but I want to use STL containers and smart pointers. In this page about pure calling, it is not told how to use lvalp* wihout a union if your tokens are not the same type. In addition, according to this page I should put %language "c++" in addition to %define api.value.type variant to use C++ variants which accept semantic types instead of union. Well, that results in the following error:

parser.ypp:3.1-21: error: %define variable 'api.pure' is not used

So I want to assign values while returning the correct token to the parser and without using the union so that I can use all C++ features.

Note: I saw this example but I still can't understand are functions make_Number already exist or they are generated? How to add value to the $ variables which belong to a defined %token from my next_token() ?

Thanks in advance.

1

1 Answers

2
votes

#define api.pure only applies to parsers generated with the C API. If you ask bison to produce a C++ parser, you don't need that declaration because it's unnecessary:

The parser invokes the scanner by calling yylex. Contrary to C parsers, C++ parsers are always pure: there is no point in using the %define api.pure directive.

But the C++ API(s) are very different from the C API. If to want to use them, you really need to read that entire manual chapter (refer to the examples while you are reading.)

Note that the variant type created by Bison is very different from std::variant, so it might or might not be what you are looking for. Unlike std::variant, Bison's variants do not store the current type of the variant's value, because the parser always knows the type of a stack value. That's fine while you're parsing, but it makes the variants less useful as exported values. (And in other applications as well.) However, they can help if you want to use non-trivial types like std::string, since Bison can ensure that destructors are correctly called. [Note 1]

If you're going to use smart pointers, you'll probably find yourself sprinkling calls to std::move in order to avoid copying non-copyable objects. (Objects on Bison's stack are often copied repeatedly during the parse.) You'll also want to use std::move to avoid excess copying of strings. You can request that Bison automatically insert calls to std::move on every access to a semantic value, but if you enable this option you need to take care to only use each semantic value once. (There's an example in the manual.)

Once you decide to use Bison's C++ API, you need to choose between two calling conventions for the lexical scanner. One option is what the manual calls "split symbols", which is just the traditional C approach (modified as with the C pure API): the lexer returns an integer (the token type) and places the semantic value in an STYPE pointed to be the argument. If the STYPE is a Bison variant, you'll want to use the emplace method to construct a value in-place (thus avoiding a copy).

There are examples in the linked page of the Bison manual. It's a bit confusing that there are two examples of using emplace; my understanding is that the second example (in which emplace takes constructor arguments) is usable with C++11 or more recent, which these days should be pretty universal (IMHO).

Alternatively, you can use "complete symbols", which are described at more length and with more examples. If you tell Bison to use the "complete symbol" API (with the %define api.token.constructor declaration), then Bison will automatically generate the various make_XXX functions. To make use of these functions, you would have to change your scanner's get_next_token member function in order to return a symbol_type object instead of an int (and then it doesn't need a yylvalp argument). That might be a larger change than you want to make.


Notes:

  1. You can use std::variant or the Boost equivalent with an explicit definition of api.value.type but that won't produce the make_* calls, and also Bison won't know how to extract individual types from the variant, so the whole %type mechanism won't work, making it not very attractive.