Sequence File with Block Compression

Question

I need to enable Sequence File with Block Compression data. Below is the table which will be stored as SequenceFile.

create table lip_data_quality
( buyer_id bigint,
  total_chkout bigint,
  total_errpds bigint
 )
 partitioned by (dt string)
row format delimited fields terminated by '\t'
stored as sequencefile
location '/apps/hdmi-technology/b_apdpds/lip-data-quality'
;

And in the above table, I am getting data in Compressed Form like this by enabling these commands-

set mapred.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;

So my question is that's all I need to enable BLOCK Compression with Sequence File? Or is there anything else I need to do? I was following this article Hadoop

Any suggestion will be appreciated.

Update:-

I am loading the data in the above table like this by putting everything in a .hql file and running that hql file from the shell command prompt. And changing the partition date everytime while running the below hql file.

set mapred.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;

insert overwrite table lip_data_quality partition (dt='20120712') 
SELECT query here which will give the output for the above table.

I have updated my question with the details that specifically answer your question. let me know if you need any more details. — arsenal

Paul M Paul M · Accepted Answer · 2012-08-04T18:10:04

That should be fine then. You can also verify it by looking at the files on HDFS. There should be a directory in HDFS named /user/hive/warehouse/lip_data_quality/dt=20120712 after your load. If you run

hadoop fs -cat

on one of the files in that folder you should be able to see the header of the file which will give you basic info on the file.

Sequence File with Block Compression

2 Answers