TRUNCATE command in Spark 2.2 Scala drops the SAP HANA Table instead of truncating

Question

I'm trying to truncate a table in SAP HANA and insert data into it using Spark Scala. However, when i run the script, my SAP HANA table gets dropped (instead of being truncated) and the code fails with below JDBC error.

As prescribed I'm using the .mode(SaveMode.Overwrite).option("truncate","true") to avoid the dropping of table but it's not functioning as mentioned in the document.

Can anyone help me with the code?

Code :

import spark.implicits._
import org.apache.spark.SparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SparkSession;
import org.apache.spark._
import org.apache.hadoop._
import org.apache.hadoop.conf.Configuration
import scala.collection.JavaConverters._
import org.apache.spark.sql.SaveMode
val url = "jdbc:sap://xx.xxx.xx:xx"
val prop = new java.util.Properties
prop.setProperty("driver", "com.sap.db.jdbc.Driver")
prop.setProperty("user", "MYSCHEMA")
prop.setProperty("password", "XXXXX")
val query1 = spark.read.jdbc(url,"MYSCHEMA.PH_SPARK_TRUNCATE",prop)
query1.createOrReplaceTempView("query1_view")
val query1out = spark.sql("select * from query1_view limit 1")
query1out.write.format("jdbc").mode(SaveMode.Overwrite).option("truncate","true")
.option("driver", "com.sap.db.jdbc.Driver")
.option("url", "jdbc:sap://xx.xxx.xx:xx")
.option("dbtable", "MYSCHEMA.PH_SPARK_TRUNCATE")
.option("user", "MYSCHEMA")
.option("password","XXXXX")
.save()

ERROR :

com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [264] (at 39): invalid datatype: TEXT type is not supported for row table: line 1 col 40 (at pos 39)
  at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
  at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
  at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:973)
  at com.sap.db.jdbc.packet.HReplyPacket.getSQLExceptionChain(HReplyPacket.java:158)
  at com.sap.db.jdbc.packet.HPartInfo.getSQLExceptionChain(HPartInfo.java:40)
  at com.sap.db.jdbc.ConnectionSapDB._receive(ConnectionSapDB.java:3115)
  at com.sap.db.jdbc.ConnectionSapDB.exchange(ConnectionSapDB.java:1518)
  at com.sap.db.jdbc.StatementSapDB._executeDirect(StatementSapDB.java:1419)
  at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1398)
  at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1383)
  at com.sap.db.jdbc.StatementSapDB._executeUpdate(StatementSapDB.java:1371)
  at com.sap.db.jdbc.StatementSapDB.executeUpdate(StatementSapDB.java:174)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:805)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
  ... 72 elided

Lars Br. Lars Br. · Accepted Answer · 2019-09-20T05:30:30

The error message indicates that a row store table should be created with a column of data type TEXT. That's not supported with HANA.

From the stack trace, it looks like Spark is trying to create a table to store the result of a query execution. I guess that it does not indicate the type of table in the CREATE TABLE SQL command, which means HANA will choose the default table type. Up until HANA 2 SP04, the default table type configuration parameter is ROW.

As this is not the best choice in most cases, the recommended solution here is to change the default table type parameter to COLUMN.

I don't know if and how that impacts Spark's behaviour concerning truncate and delete of tables, but it should get rid of the error message you posted.

TRUNCATE command in Spark 2.2 Scala drops the SAP HANA Table instead of truncating

1 Answers