6
votes

I'm going through the Boost Spirit (and Boost Fusion) tutorials (version 1.48.0). I've been playing with the toy employee example. The link to the source is here:

http://www.boost.org/doc/libs/1_48_0/libs/spirit/example/qi/employee.cpp

Here is the example's grammar:

employee_parser() : employee_parser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::double_;
        using qi::lexeme;
        using ascii::char_;

        quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

        start %=
            lit("employee")
            >> '{'
            >>  int_ >> ','
            >>  quoted_string >> ','
            >>  quoted_string >> ','
            >>  double_
            >>  '}'
            ;
    }

    qi::rule<Iterator, std::string(), ascii::space_type> quoted_string;
    qi::rule<Iterator, employee(), ascii::space_type> start;

And my modifications remove the treatment of the quotes and just parses any character between the delimiter and assigns that to the struct the parser is mapped to.

        //quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];
        start %=
            lit("employee")
            >> '{'
            >>  int_ >> ','
            >>  +(char_) >> ','
            >>  +(char_) >> ','
            >>  double_
            >>  '}'
            ;

My assumption is that char_ includes all characters until a comma is reached. However, compiling and running with the following string returns a failure to parse.

./employee
employee{10,my,name,20.0}
-------------------------
Parsing failed
-------------------------

I'm also attempting to write a similar parser to automatically cast to the appropriate types of my struct type. I'm sure I'm missing something fundamentally wrong as far as defining the correct grammar for an input string like above, so any help is greatly appreciated!

Thanks!

1

1 Answers

11
votes

+(char_) consumes one or more char, so it will also consume commas and will never move to >> ','. It's greedy.

You should write +(char_ - ','), using difference operator -:

//...
>>  int_ >> ','     
>>  +(char_ - ',') >> ','     
>>  +(char_ - ',') >> ','     
>>  double_
//...

Parser +(char_ - ',') would consume every char until comma is reached. After that it will move to >> ',', consume it and then continue with next line +(char_ - ',') until comma and so on.

More about this operator you can find here: http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/qi/reference/operator/difference.html

OR

If you want to parse names which contains only letters, you can also consider writing parser which accept only letters:

//...
>>  int_ >> ','     
>>  +(char_("a-zA-Z")) >> ','     
>>  +(char_("a-zA-Z")) >> ','     
>>  double_
//...