4
votes

How to read orc transaction hive table in spark?

I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

See complete scenario :

hive> create table default.Hello(id int,name string) clustered by
(id) into 2 buckets STORED AS ORC TBLPROPERTIES
('transactional'='true');
   
hive> insert into default.hello values(10,'abc');

Now I am trying to access Hive Orc data from Spark sql but it show only schema

>spark.sql("select * from  hello").show()  

Output: id,name

3

3 Answers

2
votes

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

1
votes

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

-1
votes

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.