1
votes

The input for the parser is similar to this example:

struct Word{
    Word(std::string txt, int val)
    :text(txt)
    ,value(val)
    {}

    std::string text;
    int value;
};

int main()
{
    std::vector<Word> input;

    input.push_back(Word("This", 10));
    input.push_back(Word("is", 73));
    input.push_back(Word("the", 5));
    input.push_back(Word("input", 32));
}

The grammar for the parser is written for to the text variable of the Words and can look like this:

qi::rule<Iterator, int()> word = qi::string("This") | 
                                 qi::string("is") | 
                                 qi::string("the") | 
                                 qi::string("input"); 
qi::rule<Iterator, std::vector<int>()> start = +word;

Parsing the std::vector<Word> input should result in a vector containing the corresponding Integer values, for this example it would be

[10,73,5,32]
  1. Is this even possible with boost::spirit or should I take a different approach?

If this is could be a reasonable solution,

  1. How can one implement an Iterator for this, how does it look like?
  2. What should the semantic actions look like to create the corresponding synthesized attribute or do I need some other spirit "magic"?

I hope I have provided enough information for this, let me know if not.


EDIT:

Looks like I asked not specific enough since I tried to keep this question as general as possible. Sehe's solution should work for what I described, but I have the following limitations:

  1. A Word can occur multiple times with different Integer values, there is no correlation between a Words text and its Integer value
  2. The "text" (in this example "This is the input") needs to be parsed anyway to complete another task. I have already written everything to do so and it would be really easy for me to add what I need, if only I could access the Integer value from inside the semantic actions somehow.
1

1 Answers

1
votes

This appears superficially more related to lexing (a.k.a. tokenizing or scanning). See Boost Spirit Lex.

With Spirit Qi "magic", use symbols:

Live On Coliru

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct tokens : qi::symbols<char, int>
{
    tokens() {
        add
            ("This",  10)
            ("is",    73)
            ("the",   5)
            ("input", 32);
    }
};

int main() {
    std::string const input("This is the input");

    std::vector<int> parsed;
    std::string::const_iterator f = input.begin(), l = input.end();
    bool ok = qi::phrase_parse(f, l, qi::no_case[ +tokens() ], qi::space, parsed);

    if (ok)
        std::cout << "Parse success: ";
    else
        std::cout << "Parse failed: ";

    std::copy(parsed.begin(), parsed.end(), std::ostream_iterator<int>(std::cout, " "));

    if (f!=l)
        std::cout << "\nRemaining input: '" << std::string(f,l) << "'\n";
}

Prints:

Parse success: 10 73 5 32 

See also qi::no_case and qi::symbols:

When symbols is used for case-insensitive parsing (in a no_case directive), added symbol strings should be in lowercase. Symbol strings containing one or more uppercase characters will not match any input when symbols is used in a no_case directive.