0
votes
grammar neq;

WS
    :   [ \t\r\n]+ -> skip
    ;


root: aGTb | bGTa | eq;


aGTb: 'a'
        |  eq  aGTb
        |  aGTb  eq
        |  aGTb  aGTb
;

bGTa: 'b'
        |  eq  bGTa
        |  bGTa  eq
        |  bGTa  bGTa
;
        
eq: 'a'  'b'
  |  'b'  'a'
  |  'a'  eq  'b'
  |  'b'  eq  'a'
  |  eq  eq
;

This version is too slow, however, tests:

  • Random strings length 20 literals: Avg time =14 ms, Min time =0, Max time =78

  • Length 40: Avg time =32, Min time =0, Max time =231

  • Length 60: Avg time =83, Min time =16, Max time =416

  • Length 80: Avg time =365, Min time =37, Max time =11935

  • Length 100: Avg time =1442, Min time =67, Max time =25317

  • Length 120: Avg time =21861, Min time =135, Max time =924769

The input for the above 924 sec parse time:

a b b a a a b b a a b a b a a a a b a a a a b b b a a a a b a a b 
a a a b b b a a a b b a b b b a a b b a b a b a b a b a a a a b b 
b a a b b b b a a b a b a b a a a b a b b b b b a a a b a a a b b 
a b a b a a b b b a a b b a b a a a a a a 

Can the grammar be rewritten to faster version? (Be careful, changing say eq into

eq: 'a'  'b'
  |  'b'  'a'
  |  eq  'a'  'b'
  |  eq  'b'  'a'
  |  'a'  eq  'b'
  |  'b'  eq  'a'
  |  'a'  'b'  eq
  |  'b'  'a'  eq
;

is not equivalent rule rewriting, as it will not recognize a a b b b b a a).

What target language are you using?Bart Kiers
root : ('a' | 'b')+;.kaby76
@kaby76 Remove the eq alternative from the root, to make the language to recognize non equal letter counts only.Tegiri Nenashi
No, for example, the Python 2 and 3 targets/runtimes are a magnitude slower than faster ones (I've seen differences as large as 20 times as slow!). But Java is a fast runtime.Bart Kiers
It won't matter what target you choose: all targets run very slowly due to backtracking. LL(k) is O(n^(k+1)), which means a very slow parse for k> "a few" (note, n=3 here, k = 1000). AdaptivePredict() memoization will add nothing here. Even the fastest target, Cpp Release, it does not terminate after many minutes for the "1000a+b's" example. You can rewrite the rules using predicates and kleene-ops for an efficient grammar, e.g., eq: ('a'|'b')+ { /* validate # of children % 2 == 0, # of children > 1, equal # of 'a's and 'b's */ }?;. Is this a CS class question?kaby76