I am trying to parse expressions of the form "1.0 + 2.0 + 3.0 + ..." into an AST. I have the following AST node for binary operations (complete, minimal code example is at the end):
struct binop_t
{
expression_t lhs, rhs;
};
I want to use the "BOOST_FUSION_ADAPT_STRUCT" macro to allow this struct to be synthesized by a boost:spirit::qi::rule:
BOOST_FUSION_ADAPT_STRUCT
(
client::ast::binop_t,
(client::ast::expression_t, lhs)
(client::ast::expression_t, rhs)
)
In other words, the binary operation AST node (binop_t) requires two operands - the left-hand side (lhs) and right-hand side (rhs) expressions that should be operated on. I am able to parse expressions of the form "1.0+(2.0+(3.0+4.0))" into this AST node by using the following qi::grammar:
qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal;
qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr;
expr = binop.alias();
binop = primary_expr > qi::lit('+') > primary_expr;
primary_expr = (qi::lit('(') > expr > qi::lit(')'))
| literal
;
literal = qi::double_;
However, I am struggling to understand how to modify this grammar so that it can parse such expressions without the use of parenthesis (e.g. "1+2+3+4+...").
I have looked over the "calc4.cpp" Boost Spirit example, and noticed that it only uses the following AST node for Binary Operations (such as add):
struct operation
{
optoken operator_;
operand operand_;
};
The difference between this example and what I am trying to do is that the example defines the grammar for synthesizing binary operations node purely in terms of a list of unary operations. The list of unary operations is synthesized into an AST node called "program":
struct program
{
operand first;
std::list<operation> rest;
};
This whole thing in synthesized using the following grammar in the example:
qi::rule<Iterator, ast::program(), ascii::space_type> expression;
qi::rule<Iterator, ast::program(), ascii::space_type> term;
qi::rule<Iterator, ast::operand(), ascii::space_type> factor;
expression =
term
>> *( (char_('+') >> term)
| (char_('-') >> term)
)
;
term =
factor
>> *( (char_('*') >> factor)
| (char_('/') >> factor)
)
;
factor =
uint_
| '(' >> expression >> ')'
| (char_('-') >> factor)
| (char_('+') >> factor)
;
In this grammar, the "expression" rule produces a "program" which is a list of operations". We can see from from the grammar rule for "expression" that it uses the Kleene star in the grammar:
*((char_('+') >> term)
This is how the grammar is able to parse chains of associative binary operations, such as "1+2+3+4+...". The attribute of this grammar is list, which matches the definition of the "program" AST node. The calculator "eval" function then simply iterates over the list of operations in "program," applying the operations to operands from left to right:
int operator()(program const& x) const
{
int state = boost::apply_visitor(*this, x.first);
BOOST_FOREACH(operation const& oper, x.rest)
{
state = (*this)(oper, state);
}
return state;
}
I have also looked over the "mini-c" Boost Spirit example, and it has a very similar AST design, where there is no binary operator AST node (only a single "operator" node that accepts a single operand).
Following is the complete, minimal code example for the program that I've implemented by so far. To recap, my question is: how can I modify this program so that it is able to synthesize a tree of binop_t AST nodes from an expression like "1+2+3+4+..." without this use of parentheses in the input text:
#include <boost/variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
#include <exception>
using boost::variant;
using boost::recursive_wrapper;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
namespace client { namespace ast {
struct literal_t;
struct binop_t;
typedef variant< recursive_wrapper<literal_t>,
recursive_wrapper<binop_t>
> expression_t;
struct literal_t
{
double value;
};
struct binop_t
{
expression_t lhs, rhs;
};
}} // ns
BOOST_FUSION_ADAPT_STRUCT
(
client::ast::literal_t,
(double, value)
)
BOOST_FUSION_ADAPT_STRUCT
(
client::ast::binop_t,
(client::ast::expression_t, lhs)
(client::ast::expression_t, rhs)
)
namespace client {
template <typename Iterator>
struct grammar_t : qi::grammar<Iterator, ast::expression_t(), ascii::space_type>
{
qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal;
qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr;
grammar_t() : grammar_t::base_type(expr)
{
expr = binop.alias();
binop = primary_expr > qi::lit('+') > primary_expr;
primary_expr = (qi::lit('(') > expr > qi::lit(')'))
| literal;
literal = qi::double_;
expr.name("expr");
binop.name("binop");
literal.name("literal");
qi::debug(expr);
qi::debug(binop);
qi::debug(literal);
}
};
} // ns
int main()
{
try
{
string input = "0.1 + 1.2 ";
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
typedef std::string::const_iterator iterator_type;
client::grammar_t<iterator_type> g;
client::ast::expression_t ast;
bool status;
status = qi::phrase_parse(begin, end, g, ascii::space, ast);
EXPECT_TRUE(status);
EXPECT_TRUE(begin == end);
} catch (std::exception& e)
{
cout << e.what() << endl;
}
}