1
votes

It seems that the Hortonworks Hive Warehouse Connector up to v.1.0.0 doesn't support schema updates. I try to use

hive
  .createTable(tableName)
  .ifNotExists()
  .column(name, type)
  .create()

if table exist but with different schema and nothing happens. And then I try to write DataFrame with different schema.

dataFrame
  .write
  .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
  .save()

and nothing happens too. I expect that an AnalysisException should be thrown as Spark does it.

1

1 Answers

0
votes

I find out that to create an HWC table a query is generated ...

CREATE TABLE IF NOT EXISTS schema.table  (SERIALKEY DECIMAL(38,10),SOME STRING,SOME_OTHER STRING,...) STORED AS ORC

it is executed only if the table is not exist and the schemes do not compare. Then a request is made to insert data ...

LOAD DATA INPATH '/tmp/20190222040853-6ab51b3c-a459-41df-9739-38bf5efb8da1' INTO TABLE schema.table

which based on the documentation ...

NO verification of data against the schema is performed by the load command.

As a result if the scheme changes the HWC makes it possible to record this data frame in the Hive storage where old table with different scheme keeps they data without any exception.