1
votes

I have a very big hadoop sequence file in the hdfs. what is the best way to fetch data from it? ie, select records and etc..

can it be done by hive? how can i create a table in hive from a sequence file?

thanks

1
Have you looked into the external tables? - Olaf

1 Answers

0
votes

If you need 'quick' access to the data you should either consider loading the data into a datastore of some sort (DB or a noSQL store such as HBase, Accumulo).

Another option (if you can re-write your data) is to look into using a MapFile - this creates an index for the keys in your sequence file and provides quicker access to the data compared to full file scanning.

Otherwise if you want to use Hive, there's a thread on the hive mailing list about this exact subject: