I can't get the string value of a token

Question

I try to implement a Lexer for a little programming language with Boost Spirit.

I have to get the value of a token and I get a bad_get exception :

terminate called after throwing an instance of 'boost::bad_get'
what(): boost::bad_get: failed value get using boost::get Aborted

I obtain this exception when doing :

std::string contents = "void";

base_iterator_type first = contents.begin();
base_iterator_type last = contents.end();

SimpleLexer<lexer_type> lexer;

iter = lexer.begin(first, last);
end = lexer.end();

std::cout << "Value = " << boost::get<std::string>(iter->value()) << std::endl;

My lexer is defined like that :

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<unsigned int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            keyword_for = "for";
            keyword_while = "while";
            keyword_if = "if";
            keyword_else = "else";
            keyword_false = "false";
            keyword_true = "true";
            keyword_from = "from";
            keyword_to = "to";
            keyword_foreach = "foreach";

            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            litteral = "...";

            left_parenth = '('; 
            right_parenth = ')'; 
            left_brace = '{'; 
            right_brace = '}'; 

            stop = ';';
            comma = ',';

            swap = "<>";
            assign = '=';
            addition = '+';
            subtraction = '-';
            multiplication = '*';
            division = '/';
            modulo = '%';

            equals = "==";
            not_equals = "!=";
            greater = '>';
            less = '<';
            greater_equals = ">=";
            less_equals = "<=";

            whitespaces = "[ \\t\\n]+";
            comments = "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/";

            //Add keywords
            this->self += keyword_for | keyword_while | keyword_true | keyword_false | keyword_if | keyword_else | keyword_from | keyword_to | keyword_foreach;
            this->self += integer | litteral | word;

            this->self += equals | not_equals | greater_equals | less_equals | greater | less ;
            this->self += left_parenth | right_parenth | left_brace | right_brace;
            this->self += comma | stop;
            this->self += assign | swap | addition | subtraction | multiplication | division | modulo;

            //Ignore whitespaces and comments
            this->self += whitespaces [lex::_pass = lex::pass_flags::pass_ignore];
            this->self += comments [lex::_pass = lex::pass_flags::pass_ignore]; 
        }

        lex::token_def<std::string> word, litteral, integer;

        lex::token_def<lex::omit> left_parenth, right_parenth, left_brace, right_brace;

        lex::token_def<lex::omit> stop, comma;

        lex::token_def<lex::omit> assign, swap, addition, subtraction, multiplication, division, modulo;
        lex::token_def<lex::omit> equals, not_equals, greater, less, greater_equals, less_equals;

        //Keywords
        lex::token_def<lex::omit> keyword_if, keyword_else, keyword_for, keyword_while, keyword_from, keyword_to, keyword_foreach;
        lex::token_def<lex::omit> keyword_true, keyword_false;

        //Ignored tokens
        lex::token_def<lex::omit> whitespaces;
        lex::token_def<lex::omit> comments;
};

Is there an other way to get the value of a Token ?

on reading again, I notice that you specify lex::omit as the token attribute type. These tokens won't expose any value data (not even iterator pairs). This might be your problem. Otherwise, I heartily recommend parsing using Qi on top of token iterators: get the best of both worlds. — sehe
I verified and sadly this is not the problem. I only use boost::get on a token that of the good type and that should have the value. — Baptiste Wicht

sehe sehe · Accepted Answer · 2011-10-14T08:48:19

You can always use the 'default' token data (which is iterator_range of the source iterator type).

std::string tokenvalue(iter->value().begin(), iter->value().end());

After studying the test cases in the boost repository, I found out a number of things:

this is by design
there is an easier way
the easier way comes automated in Lex semantic actions (e.g. using _1) and when using the lexer token in Qi; the assignment will automatically convert to the Qi attribute type
this has (indeed) got the 'lazy, one-time, evaluation' semantics mentioned in the docs

The cinch is that the token data is variant, which starts out as the raw input iterator range. Only after 'a' forced assignment, the converted attribute is cached in the variant. You can witness the transition:

lexer_type::iterator_type iter = lexer.begin(first, last);
lexer_type::iterator_type end = lexer.end();

assert(0 == iter->value().which());
std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

std::string s;
boost::spirit::traits::assign_to(*iter, s);
assert(1 == iter->value().which());
std::cout << "Value = " << s << std::endl;

As you can see, the attribute assignment is forced here, directly using the assign_to trait implementation.

Full working demonstration:

#include <boost/spirit/include/lex_lexertl.hpp>

#include <iostream>
#include <string>

namespace lex = boost::spirit::lex;

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            literal = "...";

            this->self += integer | literal | word;
        }

        lex::token_def<std::string> word, literal;
        lex::token_def<int> integer;
};

int main(int argc, const char* argv[]) {
    SimpleLexer<lexer_type> lexer;

    std::string contents = "void";

    base_iterator_type first = contents.begin();
    base_iterator_type last = contents.end();

    lexer_type::iterator_type iter = lexer.begin(first, last);
    lexer_type::iterator_type end = lexer.end();

    assert(0 == iter->value().which());
    std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

    std::string s;
    boost::spirit::traits::assign_to(*iter, s);
    assert(2 == iter->value().which());
    std::cout << "Value = " << s << std::endl;

    return 0;
}

I can't get the string value of a token

1 Answers