1
votes

In hadoop i was just playing with the these two formats to evaluate the performance of hive queries. I ended up when i do the queries on table which are stored as TEXT file give me the result earlier than the one stored as Sequence File. But shouldn't it be otherway around? Also, FYI i have loaded the data first in TEXT File table then transferred data in SEQUENCEFILE table.

create table text(acid int, value string, id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' STORED AS TEXTFILE;

create table seq(acid int, value string, id int) STORED AS SEQUENCEFILE;

load data local inpath '-----' overwrite into table text;

insert into table seq select * from text;

Text FILE :
Time taken: 36.284 seconds
       Vs
SequenceFile : 
Time taken: 42.446 seconds

Text FILE :
Time taken: 22.547 seconds
      Vs
SequenceFile : 
Time taken: 25.547 seconds
1
How did you benchmark, can you show us some code? Did you turn off the auto-compression in sequence files? - Thomas Jungblut
@ThomasJungblut i have pasted my code for table for seq vs text tables. Also, first i load the data in text file since i don't have binary data. Then i load the data in seq table from text table. - Naresh
Have you used BLOCK compression with the sequence files? - alexeipab
No, i am not using the Block compression. - Naresh

1 Answers

0
votes

Which one is faster depends on many factors, the advantage of sequence file is that you can compress them and the files will still be splittable, while if you compress text files they won't be splittable anymore (unless you're using LZO).