0
votes

Please someone could help me to get out of this issue.

I am trying to read from Greenplum using GreenPlum-Spark connector. I used jar greenplum-spark_2.11-1.5.0.jar which I downloaded from https://network.pivotal.io/products/pivotal-gpdb/

I am trying to access greenplum from spark-shell and imported jar like below

C:\spark-shell --jars C:\jars\greenplum-spark_2.11-1.6.2.jar

scala>val gscReadOptionMap = Map(
      "url" -> "jdbc:postgresql://server-ip:5432/db_name",
      "user" -> "user_id",
      "password" -> "pwd",
      "dbschema" -> "schema_name",
      "dbtable" -> "table_name",
      "driver" -> "org.postgresql.Driver"
)

scala>val gpdf = spark.read.format("greenplum").options(gscReadOptionMap).load()

(or)

scala>val gpdf = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").options(gscReadOptionMap).load()

Resuting in below error: java.lang.IllegalArgumentException: '' does not exist in "schema_name"."table_name" table at io.pivotal.greenplum.spark.GreenplumRelationProvider.createRelation(GreenplumRelationProvider.scala:50) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) ... 49 elided

1

1 Answers

0
votes

You are missing the partitionColumn option in your gscReadOptionMap. For example:

val gscOptionMap = Map(
    "url" -> "jdbc:postgresql://gsc-dev/tutorial",
    "user" -> "gpadmin",
    "password" -> "changeme",
    "dbschema" -> "faa",
    "dbtable" -> "otp_c",
    "partitionColumn" -> "airlineid"
)

For more details, please take a look at the documentation.