0
votes

I'm building this grammar for a simple programming language (already solved previous ambiguity issues: Can't figure out why Bison is throwing "Rules useless in parser due to conflicts").

This is my complete grammar: http://pastebin.com/yBHLSP0z
And this is the output file from Bison: http://pastebin.com/eAma3gWy
(sorry, they're in Spanish, but I think they're pretty self-explanatory)

The thing is, I'm still getting one shift/reduce error at state 107 (I'm translating it):

state 107

31 factor: ID .
48 concatenacion: ID . OPERADOR_SUMA ID
49              | ID . OPERADOR_SUMA literal_string

OPERADOR_SUMA  shift and go to state 140
OPERADOR_SUMA  [reduce using rule 31 (factor)]
$default       reduce using rule 31 (factor)


Now, state 107 is called from state 70:

estado 70

   45 asignacion: ID OPERADOR_ASIGNACION . concatenacion
   46           | ID OPERADOR_ASIGNACION . expresion
   47           | ID OPERADOR_ASIGNACION . literal_string

    OPERADOR_RESTA   desplazar e ir al estado 55
    PARENTESIS_ABRE  desplazar e ir al estado 56
    COMILLA          desplazar e ir al estado 67
    ID               desplazar e ir al estado 107

    expresion       ir al estado 108
    termino         ir al estado 61
    factor          ir al estado 62
    concatenacion   ir al estado 109
    literal_string  ir al estado 110
    literal_real    ir al estado 63
    literal_entero  ir al estado 64
    signo           ir al estado 65

What I think is happening (please correct me if I'm wrong) is that when it finds a rule for "asignacion" like this:

asignacion: ID OPERADOR_ASIGNACION concatenacion | ID OPERADOR_ASIGNACION expresion

it sees that from "expresion" it can get an ID token (expresion > termino > factor > ID), making a ID OPERADOR_ASIGNACION ID:

expresion:  
        expresion OPERADOR_SUMA termino
        | expresion OPERADOR_RESTA termino
        | termino
        ;


termino:
        termino OPERADOR_MULTIPLICACION factor
        | termino OPERADOR_DIVISION factor
        | factor
        ;


factor:     
        ID
        | literal_entero
        | literal_real
        | PARENTESIS_ABRE expresion PARENTESIS_CIERRA
        ;

Now, when it reaches an ID OPERADOR_ASIGNACION concatenacion and looks at the rules for "concatenacion", it gets:

concatenacion:
        ID OPERADOR_SUMA ID 
        | ID OPERADOR_SUMA literal_string 
        | literal_string OPERADOR_SUMA ID 
        | literal_string OPERADOR_SUMA literal_string
        ;

Two of them begin with "ID". So if any of those two rules are selected, it gets to a state where it can obtain a ID OPERADOR_ASIGNACION ID, only that with the "concatenacion" rules, it needs to find a "OPERADOR_SUMA" token afterwards. But I believe it's choking as soon as it sees that from both "concatenacion" and "expresion" can form the ID OPERADOR_ASIGNACION ID expression.
If this is not exactly what's going on, I'd like to know what is then the problem.
And, if I'm correct as where the error happens, I really don't know how to solve it.
Please help :)

Thanks!

1

1 Answers

2
votes

The problem arises from:

asignacion
    :   ID OPERADOR_ASIGNACION concatenacion
    |   ID OPERADOR_ASIGNACION expresion
    ;

and the selected alternatives:

expresion
    :   expresion OPERADOR_SUMA termino
    ;

termino
    :   factor
    ;

factor
    :   ID
    ;

concatenacion
    :   ID OPERADOR_SUMA ID
    ;

Which means that when your parser encounters:

x = y + z

it cannot tell whether it is processing the first or second alternative for asignacion.

That's the easy part. How to fix? The simplest fix (if it works, which I've not tested) would be to remove the concatenacion rule I showed, and in the expresion rule, recognize when you are dealing with a concatenacion vs an expresion since they are syntactically identical:

ID OPERADOR_SIGNACION ID OPERADOR_SUM ID

You'd look at the types of the two operands of the expresion, and if they are both string types, then you'd assume it was a concatenacion, otherwise an expresion.

You might want to review the whole of the concatenacion rule, though. You'd need to let strings through the factor rule, I think, so you'd add another alternative to factor:

factor
   :   literal_string
   ;

It would mean you'd have to reject literal strings in other rules, though, so more semantic checking. An alternative is to introduce a separate operator other than + to mean 'string concatenation'. SQL uses ||; some languages have used ,; you could use another token altogether, such as @. There are many options once you break away from +. Can you even just use 'two adjacent string expressions mean the concatenation operation', with no operator between them?

If none of that works, then get back to me.