7
votes

We use the Shunting-Yard algorithm to evaluate expressions. We can validate the expression by simply applying the algorithm. It fails if there are missing operands, miss-matched parenthesis, and other things. The Shunting-Yard algorithm however has a larger supported syntax than just human readable infix. For example,

1 + 2
+ 1 2
1 2 +

are all acceptable ways to provide '1+2' as input to the Shunting-Yard algorithm. '+ 1 2' and '1 2 +' are not valid infix, but the standard Shunting-Yard algorithm can handle them. The algorithm does not really care about the order, it applies operators by order of precedence grabbing the 'nearest' operands.

We would like to restrict our input to valid human readable infix. I am looking for a way to either modify the Shunting-Yard algorithm to fail with non-valid infix or provide an infix validation prior to using Shunting-Yard.

Is anyone aware of any published techniques to do this? We must support both basic operator, custom operators, brackets, and functions (with multiple arguments). I haven't seen anything that works with more than the basic operators online.

Thanks

2
You could just use a infix operator parser instead. This of course won't take advantage of your already existing Shunting Yard parser, but it will work.ryanyuyu
@rici Maybe, I'm looking into the other question and answers now.denver
@denver: I was going to answer the question directly, but I remembered that I'd already answered it, so I just refer you to my answer. The state machine I suggest is also the answer to the question "how do I handle unary minus in a shunting-yard algorithm", so you might already have something similar implemented. The linked answer also tries to distinguish () used for grouping from () used for function calls; you can ignore that part if it is not useful to you, but it's not actually more complicated than the unary minus issue.rici
@rici It does appear the state machine (2 states, expecting operator and expecting operand) is the solution to my problem. Within the if statement for each type of token we support (constant, variable, function, binary operator, unary operator, open parenthesis, close parenthesis, and argument separator) we basically throw an exception if we are not in an expected state when reading the token, we then set the state to what we expect next.denver

2 Answers

8
votes

The solution to my problem was to enhance the algorithm posted on Wikipedia with the state machine recommended by Rici. I am posting the pseudo code here because it may be of use to others.

Support two states, ExpectOperand and ExpectOperator.

Set State to ExpectOperand
While there are tokens to read:
    If token is a constant (number)
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is a variable.
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is an argument separator (a comma).
        Error if state is not ExpectOperator.
        Until the top of the operator stack is a left parenthesis  (don't pop the left parenthesis).
            Push the top token of the stack to the output queue.
            If no left parenthesis is encountered then error.  Either the separator was misplaced or the parentheses were mismatched.
        Set state to ExpectOperand.
    If token is a unary operator.
        Error if the state is not ExpectOperand.
        Push the token to the operator stack.
        Set the state to ExpectOperand.
    If the token is a binary operator.
        Error if the state is not ExpectOperator.
        While there is an operator token at the top of the operator stack and either the current token is left-associative and of lower then or equal precedence to the operator on the stack, or the current token is right associative and of lower precedence than the operator on the stack.
            Pop the operator from the operator stack and push it onto the output queue.
        Push the current operator onto the operator stack.
        Set the state to ExpectOperand. 
    If the token is a Function.
        Error if the state is not ExpectOperand.  
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a open parentheses.
        Error if the state is not ExpectOperand.
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a close parentheses.
         Error if the state is not ExpectOperator.
         Until the token at the top of the operator stack is a left parenthesis.
             Pop the token off of the operator stack and push it onto the output queue.
         Pop the left parenthesis off of the operator stack and discard.
         If the token at the top of the operator stack is a function then pop it and push it onto the output queue.
         Set the state to ExpectOperator.
At this point you have processed all the input tokens.
While there are tokens on the operator stack.
    Pop the next token from the operator stack and push it onto the output queue.
    If a parenthesis is encountered then error.  There are mismatched parenthesis.

You can easily differentiate between unary and binary operators (I'm specifically speaking about the negative prefix and subtraction operator) by looking at the previous token. If there is no previous token, the previous token is an open parenthesis, or the previous token is an operator then you have encountered a unary prefix operator, else you have encountered the binary operator.

3
votes

A nice discussion on Shunting Yard algorithms is http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm The algorithm presented there uses the key idea of the operator stack but has some grammar to know what should be expected next. It has two main functions E() which expects an expression and P() which is expecting either a prefix operator, a variable, a number, brackets and functions. Prefix operators always bind tighter than binary operators so you want to deal this any of then first.

If we say P stands for some prefix sequence and B is a binary operator then any expression will be of the form

P B P B P

i.e. you are either expecting a a prefix sequence or a binary operator. Formally the grammar is

E -> P (B P)*

and P will be

P -> -P | variable | constant | etc.

This translates to psudocode as

E() {
    P()
    while next token is a binary op:
         read next op
         push onto stack and do the shunting yard logic
         P()
    if any tokens remain report error
    pop remaining operators off the stack
}

P() {
    if next token is constant or variable:
         add to output
    else if next token is unary minus: 
         push uminus onto operator stack
         P()
}

You can expand this to handle other unary operators, functions, brackets, suffix operators.