4
votes

I have to use my local spark to connect a remote hive with authentication.

I am able to connect via beeline.

beeline> !connect jdbc:hive2://bigdatamr:10000/default Connecting to jdbc:hive2://bigdatamr:10000/default Enter username for jdbc:hive2://bigdatamr:10000/default: myusername Enter password for jdbc:hive2://bigdatamr:10000/default: ******** Connected to: Apache Hive (version 1.2.0-mapr-1703) Driver: Hive JDBC (version 1.2.0-mapr-1703) Transaction isolation: TRANSACTION_REPEATABLE_READ

How can I convert it to using spark? I tried thrift and jdbc but both not working

My trift try, don't know how to pass authentication

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder.master("yarn")\
    .appName("my app")\
    .config("hive.metastore.uris", "thrift://bigdatamr:10000")\
    .enableHiveSupport()\
    .getOrCreate()

My jdbc try, throw method not support

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:hive2://bigdatamr:10000") \
    .option("dbtable", "default.tmp") \
    .option("user", "myusername") \
    .option("password", "xxxxxxx") \
    .load()
Py4JJavaError: An error occurred while calling o183.load.

: java.sql.SQLException: Method not supported

2
do you have access to spark configuration on the remote cluster? i mean the directory PATH/TO/SPARK/conf ?user1314742
@user1314742 Yes, I canIcarus
can you locate a file called hive-site.xml? you copy this file locally to your spark conf directory, and try running your application againuser1314742

2 Answers

1
votes

You need to specify the driver you are using in the options of spark.read:

.option("driver", "org.apache.hive.jdbc.HiveDriver")

Also, for some reason you have to specify the database in the jdbc url and the name of the table with option dbTable. For some reason it does not work to simply define dbTable as database.table.

It would look like this:

jdbcDF = spark.read \
    .format("jdbc") \
    .option("driver", "org.apache.hive.jdbc.HiveDriver") \
    .option("url", "jdbc:hive2://bigdatamr:10000/default")
    .option("dbtable", "tmp") \
    .option("user", "myusername") \
    .option("password", "xxxxxxx") \
    .load()
1
votes

Apparently this problem is a configuration problem.

If you have access to your server /PATH/TO/HIVE/hive-site.xml file, copy it to your local spark configuration folder /PATH/TO/SPARK/conf/ and then retry running your application