0
votes

I created a Hive table by setting the following Properties on hive command prompt:

SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true

Create table statement:

create external table dept_comp1(id bigint,code string,name string)  LOCATION '/users/JOBDATA/comp'  ;
insert overwrite table dept_comp select * from src__1;

Now I go to this location /users/JOBDATA/comp and find a file named 000000_0.deflate

I am not sure that this is the compressed file though when I download it, its unreadable. If it is, then why does it not have an .lzo extension?

If it is not, where can I find the .lzo file?

Lastly how can I decompress it using java? Thanks

1

1 Answers

-4
votes

You can use Snappycodec Compression if you have the intention to save your disk space on hdfs. There are some compressed formats like .bz which are splittable and by setting certain hive properties like

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;