2
votes

So basically I want to parsed structure CSS code in PHP, using a lexer/parser generated by the PEAR packages PHP_LexerGenerator and PHP_ParserGenerator. My goal is to parse files like this:

selector, selector2 {
    prop: value;
    prop2 /*comment */ :
       value;

    subselector {
        prop: value;
        subsub { prop: value; }
    }
}

This is all fine as long as I don't have pseudo classes. Pseudoclasses allow it, to add : and a CSS name ([a-z][a-z0-9]*) to an element, like in a.menu:visited. Being somewhat lazy, the parser has no list of valid pseudo classes and accepts everything for the class name.

My grammar (ignoring all the special cases and whitespace) looks like this:

document   ::= (<rule>)*

rule       ::= <selector> '{' (<content>)* '}'

content    ::= <rule>
content    ::= <definition>

definition ::= <name> ':' <name> ';'

//             h1     .class.class2#id    :visited
<selector> ::= <name> (('.'|'#') <name>)* (':' <name>)?

Now, when I try to parse the following

h1 {
    test:visited {
        simple: case;
    }
}

The parser complains, that it expected a <name> to follow the double colon. So it tries to read the simple: as a <selector> (just look at the syntax highlighting of SO).

Is it my error that the parser can not backtrace enough to try the <definition> rule? Or is Lemon just not powerful enough to express this? If so, what can I do to get a parser working with this grammar?

1
Your grammar won't handle the select1, select2 { ... } notation. There's no rule that handles the 'comma separated list of selectors'.Jonathan Leffler
Your question mentions 'double colon', but there's no '::' in the example inputs nor is there anything that handles a double colon in the grammar.Jonathan Leffler

1 Answers

3
votes

Your question asks about PHP_ParserGenerator and PHP_LexerGenerator. The parser generator code is marked as 'not maintained', which bodes ill.

The syntax you are using for the grammar is not acceptable for Lemon, so you need to clarify why you think the parser generator should accept it. You mention a problem with 'expected a <name> to follow the double colon, but neither your grammar nor your sample input has a double colon, which makes it hard to help you.

I think this Lemon grammar is equivalent to the one you showed:

document        ::= rule_list.
rule_list       ::= .
rule_list       ::= rule_list rule.
rule            ::= selector LBRACE content_list RBRACE.
content_list    ::= .
content_list    ::= content_list content.
content         ::= rule.
content         ::= definition.
definition      ::= NAME COLON NAME SEMICOLON.
selector        ::= NAME opt_dothashlist opt_colonname.
opt_dothashlist ::= .
opt_dothashlist ::= dot_or_hash NAME.
dot_or_hash     ::= DOT.
dot_or_hash     ::= HASH.
opt_colonname   ::= COLON NAME.

However, when it is compiled, Lemon complains 1 parsing conflicts and the output file shows:

State 2:
          definition ::= NAME * COLON NAME SEMICOLON
          selector ::= NAME * opt_dothashlist opt_colonname
     (10) opt_dothashlist ::= *
          opt_dothashlist ::= * dot_or_hash NAME
          dot_or_hash ::= * DOT
          dot_or_hash ::= * HASH

                         COLON shift  10
                         COLON reduce 10  ** Parsing conflict **
                           DOT shift  13
                          HASH shift  12
               opt_dothashlist shift  5
                   dot_or_hash shift  7

This means it is not sure what to do with a colon; it might be the 'opt_colonname' part of a 'selector' or it might be part of a 'definition':

name1:name4 : name2:name3 ;

Did you mean to allow syntax such as that? Nominally, according to the grammar, that should be valid, but

name1:name4;

should also be valid. I think it requires 2 or 3 lookahead tokens to disambiguate these (so your grammar is not LALR(1) but LALR(3)).

Review your definition of 'selector' in particular.