3
votes

I'm wondering if there's a way in boost::spirit::lex to write a token value back to the input stream (possibly after editing) and rescanning again. What I'm basically looking for is a functionality like that offered by unput() in Flex.

Thanks!

2
What are you trying to achieve? I mean, in what context would you need to use unput()? If you show an example, I might be able to show you how I'd do it (possibly using Lexer states) - sehe
Basically, I need the lexer to match an identifier directly followed by an open paren "abc(" as one token, and put it back into the input stream with the paren being at the beginning of the string like "(abc ". The next step would be for the lexer to scan it again but as two separate tokens (a paren token then an identifier token). - Haitham Gad
Ok, I posted my take at this, let me know if I understood the goal incorrectly. - sehe

2 Answers

3
votes

Sounds like you just want to accept tokens in different orders but with the same meaning.

Without further ado, here is a complete sample that shows how this would be done, exposing the identifier regardless of input order. Output:

Input 'abc(' Parsed as: '(abc'
Input '(abc' Parsed as: '(abc'

Code

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <iostream>
#include <string>

using namespace boost::spirit;

///// LEXER
template <typename Lexer>
struct tokens : lex::lexer<Lexer>
{
    tokens()
    {
        identifier = "[a-zA-Z][a-zA-Z0-9]*";
        paren_open = '(';

        this->self.add
            (identifier)
            (paren_open)
            ;
    }

    lex::token_def<std::string> identifier;
    lex::token_def<lex::omit> paren_open;
};

///// GRAMMAR
template <typename Iterator>
struct grammar : qi::grammar<Iterator, std::string()>
{
    template <typename TokenDef>
        grammar(TokenDef const& tok) : grammar::base_type(ident_w_parenopen)
    {
        ident_w_parenopen = 
              (tok.identifier >> tok.paren_open)
            | (tok.paren_open >> tok.identifier) 
            ;
    }
  private:
    qi::rule<Iterator, std::string()> ident_w_parenopen;
};

///// DEMONSTRATION
typedef std::string::const_iterator It;

template <typename T, typename G>
void DoTest(std::string const& input, T const& tokens, G const& g)
{
    It first(input.begin()), last(input.end());

    std::string parsed;
    bool r = lex::tokenize_and_parse(first, last, tokens, g, parsed);

    if (r) {
        std::cout << "Input '" << input << "' Parsed as: '(" << parsed << "'\n";
    }
    else {
        std::string rest(first, last);
        std::cerr << "Parsing '" << input << "' failed\n" << "stopped at: \"" << rest << "\"\n";
    }
}

int main(int argc, char* argv[])
{
    typedef lex::lexertl::token<It, boost::mpl::vector<std::string> > token_type;
    typedef lex::lexertl::lexer<token_type> lexer_type;
    typedef tokens<lexer_type>::iterator_type iterator_type;

    tokens<lexer_type> tokens;
    grammar<iterator_type> g (tokens);

    DoTest("abc(", tokens, g);
    DoTest("(abc", tokens, g);
}
0
votes

I ended up implementing my own unput() functionality as follows:

   struct unputImpl
   {
      template <typename Iter1T, typename Iter2T, typename StrT>
      struct result {
         typedef void type;
      };

      template <typename Iter1T, typename Iter2T, typename StrT>
      typename result<Iter1T, Iter2T, StrT>::type operator()(Iter1T& start, Iter2T& end, StrT str) const {
         start -= (str.length() - std::distance(start, end));
         std::copy(str.begin(), str.end(), start);
         end = start;
      }
   };

   phoenix::function<unputImpl> const unput = unputImpl();

This can then be used like:

   this->self += lex::token_def<lex::omit>("{SYMBOL}\\(")
        [
           unput(_start, _end, "(" + construct<string>(_start, _end - 1) + " "),
           _pass = lex::pass_flags::pass_ignore
        ];

If the unputted string length is bigger than the matched token length, it will override some of the previously parsed input. The thing you need to take care of is to make sure the input string has sufficient empty space at the very beginning to handle the case where unput() is called for the very first matched token.