0
votes

I am trying to execute this SQL insert statement against Hive:

insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')

If I use DBeaver, this SQL statement works. When I use the PySpark REPL, however, I get the following exception.

Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 580, in sql
    return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
  File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 51, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"\nUnsupported language features in query: insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')\nTOK_QUERY 0, 0,30, 0\n  TOK_FROM 0, -1,30, 0\n    TOK_VIRTUAL_TABLE 0, -1,30, 0\n      TOK_VIRTUAL_TABREF 0, -1,-1, 0\n        TOK_ANONYMOUS 0, -1,-1, 0\n      TOK_VALUES_TABLE 1, 15,30, 39\n        TOK_VALUE_ROW 1, 17,22, 39\n          3 1, 18,18, 39\n          'c' 1, 21,21, 42\n        TOK_VALUE_ROW 1, 25,30, 49\n          4 1, 26,26, 49\n          'd' 1, 29,29, 52\n  TOK_INSERT 1, 0,-1, 12\n    TOK_INSERT_INTO 1, 0,13, 12\n      TOK_TAB 1, 4,6, 12\n        TOK_TABNAME 1, 4,6, 12\n          jvang 1, 4,4, 12\n          test1 1, 6,6, 18\n      TOK_TABCOLNAME 1, 9,12, 25\n        k 1, 9,9, 25\n        v 1, 12,12, 28\n    TOK_SELECT 0, -1,-1, 0\n      TOK_SELEXPR 0, -1,-1, 0\n        TOK_ALLCOLREF 0, -1,-1, 0\n\nscala.NotImplementedError: No parse rules for:\n TOK_VIRTUAL_TABLE 0, -1,30, 0\n  TOK_VIRTUAL_TABREF 0, -1,-1, 0\n    TOK_ANONYMOUS 0, -1,-1, 0\n  TOK_VALUES_TABLE 1, 15,30, 39\n    TOK_VALUE_ROW 1, 17,22, 39\n      3 1, 18,18, 39\n      'c' 1, 21,21, 42\n    TOK_VALUE_ROW 1, 25,30, 49\n      4 1, 26,26, 49\n      'd' 1, 29,29, 52\n \norg.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1362)\n          ;"

My code is simply the following.

sql = "insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')"
sqlContext.sql(sql)

Any idea on why this is happening? Is there any way i can append rows into an existing Hive table via PySpark? I've seen some examples using multiple SQL insert statements, but I don't think that seems performant; I am essentially trying to do bulk import (append mode) into an existing Hive table via PySpark.

I am using

  • Spark v1.6.1
  • Python v2.7.12
  • Hive v1.2.1000.
1
spark doesn't support all types of hive sql, you may check the hive-sql comparability with spark-sql here. - Rahul Sharma
Spark is not a light JDBC client for OLTP queries. It's a Big Data processing framework. Specifically, all SQL queries are rewritten as batch in-memory operations that may produce immutable files. In other words: Spark does not support Hive ACID transactions, because (a) atomic Inserts are completely out of Spark scope and (b) IMHO the Hive ACID stuff is a bizarre contraption that sane people should avoid to mess with. - Samson Scharfrichter

1 Answers

0
votes

I am also getting below error with spark 1.6

java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1362)

After search I got that spark 1.6 is using different syntax, after using that error gone.

Example:

insert into table mytable1 select t.* from (select 'Shub',30) t;

Source:

https://html.developreference.com/article/15215006/SPARK+1.6+Insert+into+existing+Hive+table+