Everyday I get more and more confused. I am learning to use spark with hive and every tutorial I find on the internet vaguely explains the relationship First of all what does it mean when people say hive compatible with spark
.I downloaded prebuilt spark and it's version is 2.1.1 and I downloaded hive 2.1.1. My goal is to access hive metastore from spark but everytime I run spark query I get
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Which according this website
If you have a metastore version mismatch, either or both of the last two SQL statements will result in this error message: Error: java.lang.reflect.InvocationTargetException (state=,code=0)
Where I am confused is when people say hive spark compatibility do they mean spark version and hive version? which in my case both are 2.1.1 ( yet I am getting this error) or they mean metastore database schema version and hive-metastore jar version inside spark/jars folder
?
Now my hive metastore-schema version is 2.1.0 and I have hive-metastore-1.2.1.spark2.jar
, So do I need to change hive-metastore-schema version to 1.2.1 ? According to this website
For handling Spark 2.1.0, which is currently shipped with Hive 1.2 jar, users need to use a Hive remote metastore service (hive.metastore.uris), where metastore service is started with hive.metastore.schema.verification as TRUE for any Spark SQL context. This will force the Spark Client to talk to a higher version of the Hive metastore (like Hive 2.1.0), using lower Hive jars (like Hive 1.2), without modifying or altering the existing Hive schema of the metastore database.
I do have hive-schema-verification set to true
and still get same error.Also please take your time to check spark-website, where they say
spark.sql.hive.metastore.version 1.2.1 ( Version of the Hive metastore. Available options are 0.12.0 through 1.2.1.)
.Wrapping up my question, my goal is to 1) understand meaning behind hive compatible with spark
2) connect to hive metastore using spark
Please try to elaborate your answer or be kind to provide me link where I can find my answers. I am genuingly confused.