1
votes

I'm writing a scanner/parser combination using Flex and Bison, if possible I would like to avoid using C++ specific features of both programs but nevertheless I need to access a C++ library from the source file generated by Bison. At the moment I'm compiling the source file generated by Flex as a C program.

One thing I thought I might be able to do is to declare STL type members inside Bison's %union statement, e.g.:

%union {
  std::string str;
};

I quickly realized that this cannot work because this produces a union that is included by the Flex source file. I then thought I might just compile that with a C++ compiler as well but the statement above is already rejected when running bison:

error: expected specifier-qualifier-list before ‘std’

I don't really want to go through the trouble of copying and concatenating strings with C stdlib functions throughout my parser. What can I do in order to make the scanner return STL types to the parser?

EDIT: the linked duplicate does not really answer my question, the answers to that one only show how to compile both files using a C++ compiler which is not my issue.

1
I think I have given up at this point, Bison's C++ Interface is a nightmare and all "minimal" examples found online are anything but. I'll just stick to using C types.Peter

1 Answers

2
votes

You can certainly compile both your generated scanner and parser with C++, even if you use the default C skeletons (and I agree that the C++ skeletons are badly documented and excessively complicated). So there is nothing stopping you from using std::string inside your parser.

However, that won't let you put a std::string inside a union, because you can't just toss a class with a non-trivial destructor into a union. It's possible to work around this limitation by explicitly declaring the semantic type and providing explicit constructors and destructors, but it's going to be a fair amount of work and it may well not be worth it.

That still leaves you with a couple of options. One is to use a pointer to a std::string, which means that your scanner action has to do something like:

[[:alpha:]][[:alnum:]_]*    yylval.strval = new std::string(yytext);

Another one is to just use C strings, leading to:

[[:alpha:]][[:alnum:]_]*    yylval.strval = strdup(yytext);

In both cases, you'll end up having to manually manage the allocated memory; C++'s smart pointers won't help you because they also have non-trivial destructors, so they can't be easily shoe-horned into semantic unions either.

Since it appears that you're going to eventually make the token into a std::string, you might as well do it from the start using the first option above. Since most tokens are short and most C++ libraries now implement a short string optimization, new std::string(yytext) will frequently require only one memory allocation (and if it requires two, the library will transparently handle the second one).