Recursive Descent Parser Should Error on Repetitive Letter Terminals

Question

Primary question: How can I update my recursive descent parser for propositional logic (written in JavaScript) so that strings like "p~" and "pp" will return an "Invalid" message?

I am very new to HTML, JavaScript, and parsing. I wanted to see if I would be able to make a simple webpage that could parse simple formulae from propositional logic. Here is the grammar:

<formula> ::= <unaryconnective> <formula>
            | "(" <formula> <binaryconnective> <formula> ")"
            | <proposition>

<unaryconnective> ::= "~"

<binaryconnective> ::= "&"

<proposition> ::= "p"
                | "q"
                | "r"

I'm new to writing grammars like this, so hopefully this grammar makes sense. So far as I understand what Wikipedia had on left recursive grammars, I don't believe this grammar is left recursive, nor does it appear ambiguous.

I then tried to create a simple webpage that would allow someone to enter a formula into a textbox, click a "Validate" button, and return a simple message saying the formula was valid or not. I tried to write a recursive descent parser that could do the parsing. Here is what I came up with based on what Wikipedia, Stack Overflow, and other resources I found had on the topic (jsfiddle):

<!DOCTYPE html>
<html lang='en'>
  <head>
    <meta charset='UTF-8'>
    <title>Logic App</title>

    <script type="text/javascript">

    var sentence;
    var len;
    var i;
    var sym;

    function validate() {
      var result;

      sentence = document.getElementById('formulaentry').value;
      len = sentence.length;
      i = -1;

      if (sentence == "") {
        document.getElementById('formulaentry').value = "There's no formula!";
      } else {
        nextSym();
        result = formula();
        if(result == 0) {
          document.getElementById('formulaentry').value = "Invalid";
        } else {
        document.getElementById('formulaentry').value = "Valid";
        }
      }
    }

    function nextSym() {
      i++;
      if (i <= (len-1)) {
        sym = sentence.charAt(i);
      } else {
        sym = "";
      }
      //console.log("Current Sym:" + sym);
    }

    function accept(s) {
      if (sym == s) {
        nextSym();
        return 1;
      }
      return 0;
    }

    function expect(s) {
      if (accept(s)) {
        return 1;
      }
      return 0;
    }

    function formula() {
      if (unaryconn(sym)) {
        nextSym();
        if (formula() == 0) return 0;
        return 1;
      } else if (accept("(")) {
        if (formula() == 0) return 0;
        if (binaryconn(sym) == 0) return 0;
        nextSym();
        if (formula() == 0) return 0;
        if (!expect(")")) return 0;
        return 1;
      } else if (proposition(sym)) {
        nextSym();
        return 1;
      } else {
        return 0;
      }
    }

    function unaryconn(s) {
      if (s == "~") {
        return 1;
      } else {
        return 0;
      }

    }

    function binaryconn(s) {
      if (s == "&") {
        return 1;
      } else {
        return 0;
      }
    }

    function proposition(s) {
      if (s == "p" || s == "q" || s == "r") {
        return 1;
      } else {
        return 0;
      }
    }
    </script>
  </head>

  <body>
    <h1>Logic App</h1>

    <h2>Check if formula is well-formed:</h2>
    <p>Enter a formula into the text box and click "Validate" to see if it is a
    wff.</p>

    <form id='frmValidateFormula'>
      <p>
        <input
          type='text'
          id='formulaentry'
          placeholder='Input formula here'>
      </p>
      <p>
        <input
          type='button'
          id='btnValidate'
          value='Validate'
          onclick='validate()'>
      </p>
    </form>
  </body>
</html>

The parser seems to mostly work. It correctly parses the following strings as valid formula:

p
~p
(p&p)
~(p&p)
(~(p&~~p)&(~p&~p))

However, it incorrectly returns "Valid" for these strings when it should return "Invalid":

p~
pp
~p~
p&
p(
p)
ppqqpqpq

It looks like the parser doesn't check the whole string in these cases, only the characters leading up to the first letter and the letter itself, and so considers it fine. I tried adding some kind of validation in formula() in the "else if (proposition(sym)" section to make sure the character following the letter is a valid one, but that wasn't working because the valid characters change depending on what came before it. Changing my grammar may help. I don't really understand what I should be considering when creating these grammars other than that left recursion will cause trouble for a recursive descent parser. I looked at several questions on Stack Overflow concerning recursive descent parsers, but none seemed to help me with my problem.

How can I update my parser so that such strings return "Invalid" as the result? I'm not necessarily looking for a full answer, just some pointers on things to consider or resources to look at. If you also know of a good resource on what to think about when constructing grammars (especially with reference to parsers), that would be fantastic.

Note: My parser currently doesn't handle whitespace. I'm fine with that for now, since I'm mostly concerned with getting the other parsing correct before updating things to handle whitespace.

Unknown Unknown · Accepted Answer · 2019-03-07T20:14:32

I haven't studied your code too thoroughly, but here's my impression:

Your parser is checking to see that the first group of characters it sees form a valid formula, and then stops. If garbage comes after that, doesn't matter, your parser is still happy because it found its valid formula.

I see about two ways to handle it in your grammar:

Require that the formula end with some "end of stream" metacharacter
Add a new rule that matches a sequence of formulas. E.g.

<document> ::= <formula> |
               <formula> <document>

(Of course, this is left-recursive, but you should be able to resolve this without too much trouble.)

Also:

} else if (proposition(sym)) {
    nextSym();
}

I find it suspicious that this branch doesn't return anything.

Recursive Descent Parser Should Error on Repetitive Letter Terminals

1 Answers