Sqoop incremental loading into partitioned hive table

Question

How to load incremental data into a partitioned hive table

I have table "users" with the following columns, I have created hive partition based on created_on field

id bigint,
name string,
created_on string(yyyy-MM-dd),
updated_on string

I have created a sqoop job to import incrementally based on last modified date

sqoop job --create users -- import --connect jdbc:mysql://<ip>/product  --driver com.mysql.jdbc.Driver --username <> -P --table users --incremental lastmodified --check-column updated_on --last-value "2016-11-15"--hive-table users --hive-import --hive-partition-key created_on --hive-partition-value "2016-11-15" --m 1

If you observe the above job, this will fetch based on last modified value and insert into the wrong partition

Is there any work around for this issue

Dennis Jaheruddin Dennis Jaheruddin · Accepted Answer · 2016-11-18T13:39:15

You load in a partition on 1 column, and expect to write based on a different column? That simply 'does not match'.

The solution appears to be to make the load and partitions line up.

So if you want to write all records with created_on equal to 2016-11-15, then also make sure to load in exactly those records. (I suppose you should not use the standard incremental functionality in this case).

Sqoop incremental loading into partitioned hive table

1 Answers