Hadoop & Hive as warehouse: daily data deliveries

Question

I am evaluating the combination of hadoop & hive (& impala) as a repolacement for a large data warehouse. I already set up a version and performance is great in read access.

Can somebody give me any hint what concept should be used for daily data deliveries to a table? I have a table in hive based on a file I put into hdfs. But now I have on a daily basis new transactional data coming in. How do I add them ti the table in hive. Inserts are not possible. HDFS cannot append. So whats the gernal concept I need to follow.

Any advice or direction to documentation is appreciated.

Best regards!

Balaswamy Vaddeman Balaswamy Vaddeman · Accepted Answer · 2013-04-21T03:11:26

 Inserts are not possible

Inserts are possible ,like you can create a new table and insert the data from new table to old table.

But simple solution is You can load data of the file into Hive table with the below command.

load data inpath '/filepath' [overwrite] into table tablename;

If you use overwrite then only existing data replced with new data otherwise It is appending only.

You can even schedule the script by creating a shell script.

Hadoop & Hive as warehouse: daily data deliveries

2 Answers