0
votes

How to load incremental data into a partitioned hive table

I have table "users" with the following columns, I have created hive partition based on created_on field

id bigint,
name string,
created_on string(yyyy-MM-dd),
updated_on string

I have created a sqoop job to import incrementally based on last modified date

sqoop job --create users -- import --connect jdbc:mysql://<ip>/product  --driver com.mysql.jdbc.Driver --username <> -P --table users --incremental lastmodified --check-column updated_on --last-value "2016-11-15"--hive-table users --hive-import --hive-partition-key created_on --hive-partition-value "2016-11-15" --m 1

If you observe the above job, this will fetch based on last modified value and insert into the wrong partition

Is there any work around for this issue

1

1 Answers

0
votes

You load in a partition on 1 column, and expect to write based on a different column? That simply 'does not match'.

The solution appears to be to make the load and partitions line up.

So if you want to write all records with created_on equal to 2016-11-15, then also make sure to load in exactly those records. (I suppose you should not use the standard incremental functionality in this case).