3
votes

I'm using xtext 2.4. What I want to do is a SQL-like syntax. The things confuse me are I'm not sure which things should be treated as terminal/datatype/parser rules. So far my grammar related to MyTerm is:

Model:
    (terms += MyTerm ';')*
;

MyTerm:
    constant=MyConstant | variable?='?'| collection_literal=CollectionLiteral 
;

MyConstant
    : string=STRING 
    | number=MyNumber
    | date=MYDATE 
    | uuid=UUID 
    | boolean=MYBOOLEAN
    | hex=BLOB
;

MyNumber:
    int=SIGNINT | float=SIGNFLOAT
;


SIGNINT returns ecore::EInt:
    '-'? INT
;


SIGNFLOAT returns ecore::EFloat:
    '-'? INT '.' INT;
;

CollectionLiteral:
    => MapLiteral | SetLiteral | ListLiteral
;

MapLiteral:
    '{' {MapLiteral} (entries+=MapEntry (',' entries+=MapEntry)* )? '}'
;

MapEntry:
    key=MyTerm ':' value=MyTerm
;

SetLiteral:
    '{' {SetLiteral} (values+=MyTerm (',' values+=MyTerm)* )+ '}'
;

ListLiteral:
    '[' {ListLiteral} ( values+=MyTerm (',' values+=MyTerm)* )? ']'
;

terminal MYDATE:
  '0'..'9' '0'..'9' '0'..'9' '0'..'9' '-'
  '0'..'9' '0'..'9' '-'
  '0'..'9' '0'..'9'
;

terminal HEX:
    'a'..'h'|'A'..'H'|'0'..'9'
;   

terminal UUID:
    HEX HEX HEX HEX HEX HEX HEX HEX '-'
    HEX HEX HEX HEX '-'
    HEX HEX HEX HEX '-'
    HEX HEX HEX HEX '-'
    HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX
;

terminal BLOB:
    '0' ('x'|'X') HEX+
;

terminal MYBOOLEAN returns ecore::EBoolean:
    'true' | 'false' | 'TRUE' | 'FALSE'
;

Few questions:

  • How to define integer with sign? If I define another terminal rule terminal SIGNINT: '-'? '0'..'9'+;, antlr will complain about INT becoming unreachable. Therefore I define it as a datatype rule SIGNINT: '-'? INT; Is this the correct way to do it?

  • How to define float with sign? I did exactly the same as define integer with sign, SIGNFLOAT: '-'? INT '.' INT;, not sure if this is correct as well.

  • How to define a date rule? I want to use a parser rule to store year/month/day info in fields, but define it as MyDate: year=INT '-' month=INT '-' date=INT; antlr will complain Decision can match input such as "RULE_INT '-' RULE_INT '-' RULE_INT" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input

  • I also have some other rules like

the following

RelationCompare:
    name=ID compare=COMPARE term=MyTerm
;

but a=4 won't be a valid RelationCompare because a and 4 will be treat as HEXs. I found this because if I change the relation to j=44 then it works. In this post it said terminal rule defined eariler will shadow those defined later. However, if I redefine terminal ID in my grammar, whether put it in front or after of terminal HEX, antlr will conplain The following token definitions can never be matched because prior tokens match the same input: RULE_HEX,RULE_MYBOOLEAN. This problem happens in k=0x00b as well. k=0xaab is valid but k=0x00b is not.

Any suggestion?

2

2 Answers

3
votes

How do you define an integer with sign?

  • Treat it as two separate tokens '-' and INT, and use a parser rule instead of a lexer rule.

How do you define a float with sign?

  • Treat it as two separate tokens '-' and FLOAT, and use a parser rule instead of a lexer rule.

How do you define a date rule?

  • Treat it as five separate tokens and use a parser rule instead of a lexer rule.

I don't know the answer to the last question since this is in Xtext as opposed to just ANTLR.

3
votes

Later I found the original antlr grammar for what I want to do therefore I simply translate the antlr grammar to xtext grammar. Here is how I defining those basic types:

terminal fragment A: 'a'|'A';  
   ... 
terminal fragment Z: 'z'|'Z';

terminal fragment DIGIT: '0'..'9';
terminal fragment LETTER: ('a'..'z'|'A'..'Z');
terminal fragment HEX: ('a'..'f'|'A'..'F'|'0'..'9'); 

terminal fragment EXPONENT: E ('+'|'-')? DIGIT+;
terminal INTEGER returns ecore::EInt: '-'? DIGIT+;
terminal FLOAT returns ecore::EFloat: INTEGER EXPONENT | INTEGER '.' DIGIT* EXPONENT?;

terminal BOOLEAN: T R U E | F A L S E;

The Date rule in original grammar is treated as a string.

About rules name (Rules: Antlr Grammar => xtext Grammar)

  • parser rule: starting with lowercase => rules starting with uppercase (each will be a Java Class)
  • terminal rule: starting with uppercase => using all uppercase with terminal prefix
  • fragment terminal rule: fragment ID => terminal fragment ID

In antlr a list of arguments is defined like this:

functionArgs
  : '(' ')'
  | '(' t1=term ( ',' tn=term )* ')'
;

The corresponding xtext grammar is:

FunctionArgs
 : '(' ')'
 | '(' ts+=Term  (',' ts+=Term )* ')'
;

For those parser rules with an argument enclosed by [ ]

properties[PropertyDefinitions props]
  : property[props] (K_AND property[props])*
;

Most of the time they could be moved to the left hand side

Properties
  : props+=Property (K_AND props+=Property)*
;

Now it's working as expected.