1
votes

I need to enable Sequence File with Block Compression data. Below is the table which will be stored as SequenceFile.

create table lip_data_quality
( buyer_id bigint,
  total_chkout bigint,
  total_errpds bigint
 )
 partitioned by (dt string)
row format delimited fields terminated by '\t'
stored as sequencefile
location '/apps/hdmi-technology/b_apdpds/lip-data-quality'
;

And in the above table, I am getting data in Compressed Form like this by enabling these commands-

set mapred.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;

So my question is that's all I need to enable BLOCK Compression with Sequence File? Or is there anything else I need to do? I was following this article Hadoop

Any suggestion will be appreciated.

Update:-

I am loading the data in the above table like this by putting everything in a .hql file and running that hql file from the shell command prompt. And changing the partition date everytime while running the below hql file.

set mapred.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;

insert overwrite table lip_data_quality partition (dt='20120712') 
SELECT query here which will give the output for the above table.
2
how are you loading data into the above table? - Paul M
I have updated my question with the details that specifically answer your question. let me know if you need any more details. - arsenal

2 Answers

1
votes

That should be fine then. You can also verify it by looking at the files on HDFS. There should be a directory in HDFS named /user/hive/warehouse/lip_data_quality/dt=20120712 after your load. If you run

hadoop fs -cat

on one of the files in that folder you should be able to see the header of the file which will give you basic info on the file.

0
votes

Set the below properties before submitting job.

  • setProperty(job, "mapred.output.compress", "true");
  • setProperty(job,"mapred.output.compression.type", "BLOCK");
  • setProperty(job,"mapred.output.compression.codec","org.apache.hadoop.io.compress.DefaultCodec");

Using DefaultCodec, one can use org.apache.hadoop.io.compress.LzoCodec;