I'm trying to build Spark 3.0.0 for my Yarn cluster, with Hadoop 2.7.3 and Hive 1.2.1. I downloaded the source and created a runnable dist with
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive-1.2 -Phadoop-2.7 -Pyarn
We're running Spark 2.4.0 in production so I copied the hive-site.xml, spark-env.sh and spark-defaults.conf from there.
When I try to create a SparkSession in a normal Python REPL, I get the following uninformative error. How can I debug this? I can run the spark-shell and get to a scala prompt with Hive access seemingly without error.
Python 3.6.3 (default, Apr 10 2018, 16:07:04)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import sys
>>> os.environ['SPARK_HOME'] = '/home/pmccarthy/custom-spark-3'
>>> sys.path.insert(0,os.path.join(os.environ['SPARK_HOME'],'python','lib','py4j-src.zip'))
>>> sys.path.append(os.path.join(os.environ['SPARK_HOME'],'python'))
>>> import pyspark
>>> from pyspark.sql import SparkSession
>>> spark = (SparkSession.builder.enableHiveSupport().config('spark.master','local').getOrCreate())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pmccarthy/custom-spark-3/python/pyspark/sql/session.py", line 191, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/home/pmccarthy/custom-spark-3/python/lib/py4j-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/pmccarthy/custom-spark-3/python/pyspark/sql/utils.py", line 137, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.IllegalArgumentException: <exception str() failed>
session._jsparkSession.sessionState().conf().setConfString(key, value)github.com/apache/spark/blob/master/python/pyspark/sql/… - Andrew SeymourenableHiveSupport()is causing the crashes - if I build a SparkSession without it then it starts up fine. It's still a mystery though, as it crashes the same way whether or not my hive-site.xml has my config or is blank. - Patrick McCarthyraise e- Andrew Seymour-Phive-1.2but not including-Phive-thriftserverprevented some necessary jars from being built. After doing that I still didn't have total support, but was able to connect to my metastore with both.config('spark.sql.hive.metastore.version','1.2.1')and.config('spark.sql.hive.metastore.jars','maven'). It still seems like the build flag implies it should produce a working install for hive 1.2 though. - Patrick McCarthy