I'm working on a sqoop incremental job to load data from mysql to hdfs. Below is the following scenarios.
Scenario 1: Below are the records inserted in sample table in mysql.
select * from sample;
+-----+--------+--------+---------------------+
| id | policy | salary | updated_time |
+-----+--------+--------+---------------------+
| 100 | 1 | 4567 | 2017-08-02 01:58:28 |
| 200 | 2 | 3456 | 2017-08-02 01:58:29 |
| 300 | 3 | 2345 | 2017-08-02 01:58:29 |
+-----+--------+--------+---------------------+
Below is the table structure of sample table in mysql:
create table sample (id int not null primary key, policy int, salary int, updated_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);
I'm trying to import this to hdfs by creating a sqoop job as below
sqoop job --create incjob -- import --connect jdbc:mysql://localhost/retail_db --username root -P --table sample --merge-key id --split-by id --target-dir /user/cloudera --append --incremental lastmodified --check-column updated_time -m 1
After executing sqoop job below is the output records in hdfs.
$ hadoop fs -cat /user/cloudera/par*
100,1,4567,2017-08-02 01:58:28.0
200,2,3456,2017-08-02 01:58:29.0
300,3,2345,2017-08-02 01:58:29.0
Scenario 2: After inserting few new records and updating existing records in sample table. Below is the sample table.
select * from sample;
+-----+--------+--------+---------------------+
| id | policy | salary | updated_time |
+-----+--------+--------+---------------------+
| 100 | 6 | 5638 | 2017-08-02 02:01:09 |
| 200 | 2 | 7654 | 2017-08-02 02:01:10 |
| 300 | 3 | 2345 | 2017-08-02 01:58:29 |
| 400 | 4 | 1234 | 2017-08-02 02:01:17 |
| 500 | 5 | 6543 | 2017-08-02 02:01:18 |
+-----+--------+--------+---------------------+
After running the same sqoop job below is the records in hdfs.
hadoop fs -cat /user/cloudera/par*
100,1,4567,2017-08-02 01:58:28.0
200,2,3456,2017-08-02 01:58:29.0
300,3,2345,2017-08-02 01:58:29.0
100,6,5638,2017-08-02 02:01:09.0
200,2,7654,2017-08-02 02:01:10.0
400,4,1234,2017-08-02 02:01:17.0
500,5,6543,2017-08-02 02:01:18.0
Here the updated records in mysql are inserted as new records in hdfs, instead of updating the existing records in hdfs. I have used both --merge-key as well as --append in my sqoop job conf. Could any help me on how to resolve this issue.