1
votes

I am evaluating the combination of hadoop & hive (& impala) as a repolacement for a large data warehouse. I already set up a version and performance is great in read access.

Can somebody give me any hint what concept should be used for daily data deliveries to a table? I have a table in hive based on a file I put into hdfs. But now I have on a daily basis new transactional data coming in. How do I add them ti the table in hive. Inserts are not possible. HDFS cannot append. So whats the gernal concept I need to follow.

Any advice or direction to documentation is appreciated.

Best regards!

2

2 Answers

2
votes
 Inserts are not possible

Inserts are possible ,like you can create a new table and insert the data from new table to old table.

But simple solution is You can load data of the file into Hive table with the below command.

load data inpath '/filepath' [overwrite] into table tablename;

If you use overwrite then only existing data replced with new data otherwise It is appending only.

You can even schedule the script by creating a shell script.

3
votes

Hive allows for data to be appended to a table - the underlying implementation of how this happens in HDFS doesn't matter. There are a number of things you can do append data:

  1. INSERT - You can just append rows to an existing table.
  2. INSERT OVERWRITE - If you have to process data, you can perform an INSERT OVERWRITE to re-write a table or partition.
  3. LOAD DATA - You can use this to bulk insert data into a table and, optionally, use the OVERWRITE keyword to wipe out any existing data.
  4. Partition your data.
  5. Load data into a new table and swap the partition in

Partitioning is great if you know you're going to be performing date based searches and gives you the ability to use options 1, 2, & 3 at either the table or partition level.