2
votes

So we are trying to create a Hive table with ORC format bucketed and enabled for transactions using the below statement

create table orctablecheck ( id int,name string) clustered by (sno) into 3  buckets stored as orc TBLPROPERTIES ( 'transactional'='true')

The table is getting created in Hive and also Reflects in Beeline both in the Metastore as well as Spark SQL(which we have configured to run on top of Hive JDBC)

We are now inserting data into this table via Hive. However we see after insertion the data doesnt reflect in Spark SQL. It only reflects correctly in Hive.

The table only shows the data in the table if we restart the Thrift Server.

2

2 Answers

0
votes

Is the transaction attribute set on your table? I observed that hive transaction storage structure do not work with spark yet. You can confirm this by looking at the transactional attribute in the output of below command in hive console.

desc extended <tablename> ;

If you'd need to access transactional table, consider doing a major compaction and then try accessing the tables

ALTER TABLE <tablename> COMPACT 'major';
0
votes

I created a transactional table in Hive, and stored data in it using Spark (records 1,2,3) and Hive (record 4).

After major compaction,

  • I can see all 4 records in Hive (using beeline)
  • only records 1,2,3 in spark (using spark-shell)
  • unable to update records 1,2,3 in Hive
  • update to record 4 in Hive is ok