I would like to write parquet files to PostgreSQL. I am using Spark and to write the file I am using Spark Dataframe's write.jdbc function. Everything works fine for the parquet column types like long, decimal or text. The problem is with the complex types like Map. I would like to to store Map as json in my PostgreSQL. Since I know that PostgreSQL can convert text datatype into json automatically(using the cast operation) so I am dumping map to a json string.
But the spark program complains that we are trying to insert "character varying" datatype into a column of "json" type. Which make it clear that PostgreSQL is not converting "character varying" into JSON automatically.
I went ahead and logged into my database and manually tried to insert a JSON string into a JSON datatype colum of a table and it worked.
My question is why my spark program is complaining for the cast operation?
I am using Spark version 1.6.1, PostgreSQL 4.3 and JDBC 42.1.1
Here is the code snippet
url = "jdbc:postgresql://host_name:host_port/db_name"
data_frame.write.jdbc(url, table_name, properties={"user": user, "password": password})
Error stack trace:
Hint: You will need to rewrite or cast the expression.
Position: 66 Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1325)
at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1350)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:458)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:791)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1547)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:215)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:276)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.postgresql.util.PSQLException: ERROR: column "value" is of type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 66
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2476)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2189)
... 18 more
data_frame.show()would also greatly help to give you an answer. - Roberto Congiu