Error doing .show() in spark sql SELECT query

Question

I created a Hive container with docker, I created poke table and I have this error when I do a Select query a after call show() function.

The code:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, HiveContext


spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .config("hive.metastore.uris", "thrift://localhost:9083") \
    .enableHiveSupport() \
    .getOrCreate()
spark.sql("SELECT * FROM pokes").show()

The error is:

18/04/25 11:58:34 INFO SparkContext: Created broadcast 0 from Traceback (most recent call last): File "/Users/xxxxx/scripts/hive/hive.py", line 12, in spark.sql("SELECT * FROM pokes").show() File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 336, in show File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: namenode'

Thanks!

I think some underneath DOCKER/HADOOP configuration is broken. It's trying to connect to host with domain name namenode — vvg

Rrr Rrr · Accepted Answer · 2018-04-25T15:23:49

I dont think the problem is .show(), but the execution of the query itself.

Search the config file for HDFS (hdfs-site) and change the value for namenode:

<configuration>
 <property>
     <name>dfs.namenode.http-address</name>
     <value>webhdfs://localhost:50070</value>
 </property>
</configuration>

I have used localhost as address assuming you are use a standalone conf, else you need to find out the name of the namenode. Also it might just need hdfs instead of webhdfs as prefix of the address.

After that you can try to restart the namenode:

$HADOOP_HOME/bin/stop.dfs.sh
$HADOOP_HOME/bin/hadoop-daemon.sh stop namenode
$HADOOP_HOME/bin/start.dfs.sh
$HADOOP_HOME/bin/hadoop-daemon.sh start namenode

Error doing .show() in spark sql SELECT query

2 Answers