Parsing identifiers but excluding reserved words in BOOST Spirit

Question

I have a SPIRIT grammar containing:

small %= char_("a-z");
large %= char_("A-Z");
digit %= char_("0-9");
symbol %= char_("!#$%&*+./<=>?@\\^|~:") | char_('-');
special %= char_("(),;[]`{}");
graphic %= small | large | symbol | digit | special | char_('"') | char_('\'');

dashes %= lit("--")>>*lit("-");

varsym %= ((symbol-lit(':'))>>*symbol)-reservedop-dashes;
reservedop %= string("..") | string(":") | string("::") | string("=") | string("\\") | string("|") | string("<-") | string("->") | string("@") | string("~") | string("=>");

Spirit doesn't require a separate lexer and parser (See What are the Benefits of Using a Lexer?), and I've followed this practice by defining the first six rules as qi::rule<Iterator, char()>, and the last three rules as qi::rule<Iterator, std::string()>. Note that these rules therefore have no whitespace skipper.

Also, note that I'm trying to parse things as varsym, and not as reservedop. I'm only using reservedop to exclude things in the varsym rule.

The exclusion of reserved words in varsym doesn't work though. == should be a valid varsym but its ignored because it begins with = which is a reservedop.

The answer to another question suggested defining something like

    reservedop_ %= reservedop >> !symbol

and then using that. I'm not sure this works, though, and it certainly doesn't seem very elegant.

What is the right way to do this in BOOST Spirit?

In case its is not immediately obvious, this is a portion of a grammar for parsing Haskell that is a roughly literal translation of the rules in the Haskell Report. — BenRI

sehe sehe · Accepted Answer · 2012-11-13T08:27:53

It seems to me you are confusing lexing and parsing phases.

he exclusion of reserved words in varsym doesn't work though. == should be a valid varsym but its ignored because it begins with = which is a reservedop.

That statement doesn't make much sense with the code shown, because you never show how you use the rules:

 rule1 = varsym | reservedop; // would parse "==" as varsym
 rule2 = reservedop | varsym; // would parse "==" as reservedop

Look at

qi::graph, qi::alpha, qi::alnum, qi::lower, qi::upper etc.
- http://www.boost.org/doc/libs/1_52_0/libs/spirit/doc/html/spirit/qi/reference/char/char_class.html
keyword and distinct parsers in the Spirit Repository:
- http://www.boost.org/doc/libs/1_52_0/libs/spirit/repository/doc/html/index.html

If you want to work with tokens defined from 'regular expressions' as your code seems to suggest, look at using Spirit with a Lex-based tokenizer:

http://www.boost.org/doc/libs/1_52_0/libs/spirit/doc/html/spirit/lex.html

Parsing identifiers but excluding reserved words in BOOST Spirit

1 Answers