I want to drop a hive table through sparksql.
In a cluster with hadoop 2.6, hive 2.0, spark 1.6 and spark 2.0 installed. I tried the following code in pyspark shell and spark-submit job with both version.
sqlContext.sql('drop table test') //spark 1.6
spark.sql('drop table test') //spark 2.0
Both code works fine in pyspark-shell , I can see from the hive cli that the test table no longer exist.
However, if the code was in a python file and later submitted to cluster using spark-submit, the code never took effect.
spark 2.0 even gave error about
pyspark.sql.utils.AnalysisException: u"Table to drop '`try`' does not exist;"
I have copied hive-site.xml into the conf directory in spark.
What would be the correct way to drop a hive table through sparksql?
Update:
I tried compared the spark environment between the spark-shell and the job I submitted using the following code
spark-submit --master yarn --deploy-mode cluster try_spark_sql.py
In the spark-shell environment, I can see spark.sql.catalogImplementation is set to hive
IN the job submitted using the above code. The environment doesn't contain spark.sql.catalogImplementation I tried setting it using the following code:
spark = SparkSession.builder.appName("PythonSQL").config("spark.sql.catalogImplementation","hive").
But it doesn't have any effect on the environment.
One workaround I found is submitting the job using client mode instead of cluster mode. Then the hive table can be successfully dropped.