0
votes

I want to take the .g files from Apache Hive and build a parser (targeting JavaScript) -- initially, as just a way to validate user-input Hive queries. The files I'm using come from apache-hive-1.0.0-src\ql\src\java\org\apache\hadoop\hive\ql\parse from the Hive tgz: HiveLexer.g, HiveParser.g, FromClauseParser.g, IdentifiersParser.g, SelectClauseParser.g.

I see no indication within the grammar files which version of ANTLR to use, so I've tried running antlr (from apt-get pccts), antlr3 and antlr4. they all throw errors of some sort, so I have no clue which one to run or if I can (or need to) convert the .g files between versions.

The errors I'm getting are as follows:

antlr -Dlanguage=JavaScript HiveParser.g (looks like it doesn't support JS anyway):

warning: invalid option: '-Dlanguage=JavaScript'
HiveParser.g, line 17: syntax error at "grammar" missing { QuotedTerm PassAction ! \< \> : }
HiveParser.g, line 17: syntax error at "HiveParser" missing { QuotedTerm PassAction ! \< \> : }
HiveParser.g, line 17: syntax error at ";" missing Eof
HiveParser.g, line 28: lexical error: invalid token (text was ',')

antlr3 -Dlanguage=JavaScript HiveParser.g:

error(10):  internal error: Exception FromClauseParser.g:302:85: unexpected char: '-'@org.antlr.grammar.v2.ANTLRLexer.nextToken(ANTLRLexer.java:347): unexpected stream error from parsing FromClauseParser.g

error(150):  grammar file FromClauseParser.g has no rules
error(100): FromClauseParser.g:0:0: syntax error: assign.types: <AST>:299:68: unexpected AST node: ->
error(100): FromClauseParser.g:0:0: syntax error: define: <AST>:299:68: unexpected AST node: ->
error(106): SelectClauseParser.g:151:18: reference to undefined rule: tableAllColumns

antlr4 -Dlanguage=JavaScript HiveParser.g:

warning(202): HiveParser.g:30:0: tokens {A; B;} syntax is now tokens {A, B} in ANTLR 4
error(50): HiveParser.g:636:34: syntax error: '->' came as a complete surprise to me while looking for rule element
error(50): HiveParser.g:636:37: syntax error: '^' came as a complete surprise to me
error(50): HiveParser.g:638:50: syntax error: '->' came as a complete surprise to me while looking for rule element
error(50): HiveParser.g:638:53: syntax error: '^' came as a complete surprise to me

The antlr3 error referencing @org.antlr.grammar.v2.ANTLRLexer.nextToken seems suspect. Is it using the v2 lexer instead of v3? If so, maybe v3 is what I should target, but it's somehow not hitting it?

Or is this not an issue with versioning and instead with invocation? Or is Hive built in a way that provides additional files needed?

1
Tokens such as ^ or -> (in the parser) are indicative of ANTLR v3.Lucas Trzesniewski
.g means it is almost v3. For ANTLR v3, the grammar can be target language dependent so that you cannot simply ask ANTLR compiler to generate JavaScript for you. Extra effort is required to port the grammar file.Lex Li

1 Answers

0
votes

According to Hive source code, they use ANTLR 3.4. But before you start remove the last string from FromClauseParser.g

//------------------------------------------------------------------------