I am using PutHive3Streaming to load avro data from Nifi to Hive. For a sample, I am sending 10 MB data Json data to Nifi, converting it to Avro (reducing the size to 118 KB) and using PutHive3Streaming to write to a managed hive table. However, I see that the data is not compressed at hive.
hdfs dfs -du -h -s /user/hive/warehouse/my_table*
32.1 M /user/hive/warehouse/my_table (<-- replication factor 3)
At the table level, I have:
STORED AS ORC
TBLPROPERTIES (
'orc.compress'='ZLIB',
'orc.compression.strategy'='SPEED',
'orc.create.index'='true',
'orc.encoding.strategy'='SPEED',
'transactional'='true');
and I have also enabled:
hive.exec.dynamic.partition=true
hive.optimize.sort.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict
hive.optimize.sort.dynamic.partition=true
avro.output.codec=zlib
hive.exec.compress.intermediate=true;
hive.exec.compress.output=true;
It looks like despite this, compression is not enabled in Hive. Any pointers to enable this?