0
votes

I created a Hive container with docker, I created poke table and I have this error when I do a Select query a after call show() function.

The code:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, HiveContext


spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .config("hive.metastore.uris", "thrift://localhost:9083") \
    .enableHiveSupport() \
    .getOrCreate()
spark.sql("SELECT * FROM pokes").show()    

The error is:

18/04/25 11:58:34 INFO SparkContext: Created broadcast 0 from Traceback (most recent call last): File "/Users/xxxxx/scripts/hive/hive.py", line 12, in spark.sql("SELECT * FROM pokes").show() File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 336, in show File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: namenode'

Thanks!

2
No SQLContext()?pissall
I think SparkSession manage SQLContextvicrab
I think some underneath DOCKER/HADOOP configuration is broken. It's trying to connect to host with domain name namenodevvg

2 Answers

1
votes

I dont think the problem is .show(), but the execution of the query itself.

Search the config file for HDFS (hdfs-site) and change the value for namenode:

<configuration>
 <property>
     <name>dfs.namenode.http-address</name>
     <value>webhdfs://localhost:50070</value>
 </property>
</configuration>

I have used localhost as address assuming you are use a standalone conf, else you need to find out the name of the namenode. Also it might just need hdfs instead of webhdfs as prefix of the address.

After that you can try to restart the namenode:

$HADOOP_HOME/bin/stop.dfs.sh
$HADOOP_HOME/bin/hadoop-daemon.sh stop namenode
$HADOOP_HOME/bin/start.dfs.sh
$HADOOP_HOME/bin/hadoop-daemon.sh start namenode
0
votes

Maybe the problem is not in your code.

Check the version from Java JDK that you are using. What I know is that the spark.sql().show() is not compatible with Java JDK 11. If you are using this version, just make a downgrade to version 8. And don´t forget to configure correctly the environments variable for JDK 8.