0
votes

I am facing a problem: I can't access Hive tables from Spark, using spark-submit, while I can with the pyspark shell. Here is the piece of code:

from pyspark.sql import SparkSession, HiveContext

spark = SparkSession \
   .builder \
   .appName("Python Spark SQL Hive integration example") \
   .enableHiveSupport() \
   .getOrCreate()

spark.sql("SHOW TABLES").show()

Here is the result with pyspark (shell):

+--------+-------------+-----------+
|database|    tableName|isTemporary|
+--------+-------------+-----------+
| default|       table1|      false|
| default|       table2|      false|
+--------+-------------+-----------+

Here is the result with spark-submit:

+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

I tried to add spark conf directory to the classpath, to add a "--files" with hive-site.xml, I tried also with Hivecontext, and got the same results. i tried with scala : same results.

EDIT : I am not connecting to a remote Hive server, but on the same one

1

1 Answers

0
votes

Solution found: I was using some UDF (User-defined functions) in my .py file. For some reason, i think that it's creating a context and I wasn't using the right one. It works fine now.