0
votes

I am trying to run sqoop import with --incremental lastmodified, for updated and inserted records. Following is the table :

mysql> describe inc_lastmod_test;
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| Field             | Type        | Null | Key | Default           | Extra                       |
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| id                | int(11)     | NO   | PRI | 0                 |                             |
| value             | varchar(20) | YES  |     | NULL              |                             |
| last_updated_date | timestamp   | NO   | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------------+-------------+------+-----+-------------------+-----------------------------+

And the dataset in the table is :

mysql> select * from inc_lastmod_test;
+----+----------+---------------------+
| id | value    | last_updated_date   |
+----+----------+---------------------+
|  1 | first    | 2016-01-01 00:00:00 |
|  2 | second   | 2016-06-04 06:56:49 |
|  3 | newthird | 2016-06-04 07:06:40 |
|  4 | fourth   | 2016-01-04 00:00:00 |
|  5 | fifth    | 2016-01-05 00:00:00 |
|  6 | sixsth   | 2016-01-06 00:00:00 |
|  7 | seventh  | 2016-01-07 00:00:00 |
|  8 | eighth   | 2016-06-04 07:04:58 |
+----+----------+---------------------+

I previously imported table into HDFS which is in following state :

[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/inc_lastmod_test/part*
1,first,2016-01-01 00:00:00.0
2,second,2016-06-04 06:56:49.0
3,third,2016-01-03 00:00:00.0
4,fourth,2016-01-04 00:00:00.0
5,fifth,2016-01-05 00:00:00.0
6,sixsth,2016-01-06 00:00:00.0
7,seventh,2016-01-07 00:00:00.0

Based on the current state of data in mysql table, i expect to have 1)new record for id=8 2)updated record for id=3 (with changed value) in HDFS dir

However, it only seems to have inserted all the records and not updating existing records after running following command :

    [cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00"

Please tell me where I am going wrong?

2
Can anyone please help with this? - Prashant Chopra
Unfortunately, this question never got answered. - Prashant Chopra

2 Answers

0
votes
  sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00

Incremental updates doesn't modify the existing file (/user/cloudera/inc_lastmod_test/part*). It fetches all the records which got modified on and after the given date and write them to new file.

0
votes

Try this:

   [cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00.0"