Cannot import Postgres into Spark with JDBC but displays Hive warning

Question

I am trying to import my Postgres database into my Kubernetes Spark setup

After running kubectl exec zeppelin-controller-3i97t -it -- pyspark --packages org.postgresql:postgresql:9.4.1209, I attempt to connect to the database:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
url = 'jdbc:postgresql://PG_SERVER_IP/db_name'
properties = {
    "user": "PG_USER",
    "password": "PASSWORD",
    "driver": "org.postgresql.Driver"
}
df = sqlContext.read.jdbc(url=url, table='objects', properties=properties)

but df is not defined for me to use and there are warnings below regarding Hive. How can I circumvent this issue? Why is Hive even involved? Anyway to bypass it so that I can import the data into Spark as RDD or dataframe?

WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)

Ayan Guha Ayan Guha · Accepted Answer · 2016-09-14T06:23:37

0

votes

Most likely your Hive config file is not in conf directory.

Cannot import Postgres into Spark with JDBC but displays Hive warning

1 Answers