sparksql drop hive table

Question

I want to drop a hive table through sparksql.

In a cluster with hadoop 2.6, hive 2.0, spark 1.6 and spark 2.0 installed. I tried the following code in pyspark shell and spark-submit job with both version.

sqlContext.sql('drop table test')  //spark 1.6
spark.sql('drop table test')       //spark 2.0

Both code works fine in pyspark-shell , I can see from the hive cli that the test table no longer exist.

However, if the code was in a python file and later submitted to cluster using spark-submit, the code never took effect.

spark 2.0 even gave error about

pyspark.sql.utils.AnalysisException: u"Table to drop '`try`' does not exist;"

I have copied hive-site.xml into the conf directory in spark.

What would be the correct way to drop a hive table through sparksql?

Update:

I tried compared the spark environment between the spark-shell and the job I submitted using the following code

spark-submit --master yarn --deploy-mode cluster try_spark_sql.py

In the spark-shell environment, I can see spark.sql.catalogImplementation is set to hive

IN the job submitted using the above code. The environment doesn't contain spark.sql.catalogImplementation I tried setting it using the following code:

spark = SparkSession.builder.appName("PythonSQL").config("spark.sql.catalogImplementation","hive").

But it doesn't have any effect on the environment.

One workaround I found is submitting the job using client mode instead of cluster mode. Then the hive table can be successfully dropped.

Question, are you using sqlContext context in your spark-submit script? — Thiago Baldim
of course, I didn't add that part of the code. The sparkcontx is initialized as usual and using sqllContxt.sql can read from .orc file sucessfully. — Heyang Wang

Arunakiran Nulu Arunakiran Nulu · Accepted Answer · 2016-09-30T21:21:28

When you are using PySpark shell, Spark has inbuilt Hive support, the default SQLContext implementation (the one available as a sqlContext) is HiveContext.

In your application if you are using plain SQLContext which doesn't provide Hive capabilities.

Please do as below , it should work.

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql('drop table test')

sparksql drop hive table

3 Answers