I am trying to import my Postgres database into my Kubernetes Spark setup
After running kubectl exec zeppelin-controller-3i97t -it -- pyspark --packages org.postgresql:postgresql:9.4.1209, I attempt to connect to the database:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
url = 'jdbc:postgresql://PG_SERVER_IP/db_name'
properties = {
"user": "PG_USER",
"password": "PASSWORD",
"driver": "org.postgresql.Driver"
}
df = sqlContext.read.jdbc(url=url, table='objects', properties=properties)
but df is not defined for me to use and there are warnings below regarding Hive. How can I circumvent this issue? Why is Hive even involved? Anyway to bypass it so that I can import the data into Spark as RDD or dataframe?
WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)