2
votes

Can someone help me with the exact difference between append and lastmodified modes in sqoop incremental?

What is the need of lastmodified when the same thing can be done with append mode (with --check-column as timestamp).It works the same and imports the updated and inserted records as well.

1

1 Answers

2
votes
  1. Mode: append can be used for columns where you know the last value.
  2. Mode: lastmodified mode can be used for timestamp column. It may be hard to remember the last modified timestamp.If you know the last modified timestamp you can go head with the first approach.

Mode: Append

mysql> describe emp;
+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id     | int(11)      | YES  |     | NULL    |       |
| name   | varchar(100) | YES  |     | NULL    |       |
| deg    | varchar(100) | YES  |     | NULL    |       |
| salary | int(11)      | YES  |     | NULL    |       |
| dep    | varchar(10)  | YES  |     | NULL    |       |
+--------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Import data using sqoop command.

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
-m 3 \
--table emp \
--split-by id \
--columns id,name,deg \
--warehouse-dir /user/sqoop/ \
--delete-target-dir \
--as-textfile

HDFS output

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
drwxr-xr-x   - cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002


mysql> select * from emp;
+------+---------+--------------+--------+------+
| id   | name    | deg          | salary | dep  |
+------+---------+--------------+--------+------+
| 1201 | gopal   | manager      |  50000 | tp   |
| 1202 | manisha | Proof reader |  50000 | TP   |
| 1203 | php dev | TECH WRITER  |  50000 | AC   |
| 1204 | Nilesh  | Domino dev   |  70000 | AF   |
| 1205 | Vinayak | Java dev     |  50000 | IT   |
| 1206 | Amish   | Cog dev      |  60000 | IT   |
| 1207 | Jatin   | Oracel dev   |  40001 | IT   |
| 1208 | Viren   | Java dev     |  70004 | IT   |
| 1209 | Ashish  | Oracel dev   |  40001 | IT   |
| 1210 | Satish  | Java dev     |  70004 | IT   |
+------+---------+--------------+--------+------+
10 rows in set (0.00 sec)

Insert new record into table.

mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
Query OK, 1 row affected (0.03 sec)

mysql> select * from emp;
+------+---------+--------------+--------+------+
| id   | name    | deg          | salary | dep  |
+------+---------+--------------+--------+------+
| 1201 | gopal   | manager      |  50000 | tp   |
| 1202 | manisha | Proof reader |  50000 | TP   |
| 1203 | php dev | TECH WRITER  |  50000 | AC   |
| 1204 | Nilesh  | Domino dev   |  70000 | AF   |
| 1205 | Vinayak | Java dev     |  50000 | IT   |
| 1206 | Amish   | Cog dev      |  60000 | IT   |
| 1207 | Jatin   | Oracel dev   |  40001 | IT   |
| 1208 | Viren   | Java dev     |  70004 | IT   |
| 1209 | Ashish  | Oracel dev   |  40001 | IT   |
| 1210 | Satish  | Java dev     |  70004 | IT   |
| 1211 | Jag     | be           |  20000 | IT   |
+------+---------+--------------+--------+------+
11 rows in set (0.00 sec)

Command for incremental import

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table emp \
--split-by id \
--check-column id \
--incremental append \
--last-value 1210 \
--warehouse-dir /user/sqoop/ \
--as-textfile

After importing

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
-rw-r--r--   1 cloudera cloudera           21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
1211,Jag,be,20000,IT

Mode: lastmodified

mysql> describe orders;

+-------------------+-------------+------+-----+---------+----------------+

| Field             | Type        | Null | Key | Default | Extra          |

+-------------------+-------------+------+-----+---------+----------------+

| order_id          | int(11)     | NO   | PRI | NULL    | auto_increment |

| order_date        | datetime    | NO   |     | NULL    |                |

| order_customer_id | int(11)     | NO   |     | NULL    |                |

| order_status      | varchar(45) | NO   |     | NULL    |                |

+-------------------+-------------+------+-----+---------+----------------+

4 rows in set (0.00 sec)    

Import the order into hdfs

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id  \
--target-dir /user/sqoop/orders \
--as-textfile

After import

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
-rw-r--r--   1 cloudera supergroup     741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
-rw-r--r--   1 cloudera supergroup     753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
-rw-r--r--   1 cloudera supergroup     752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
-rw-r--r--   1 cloudera supergroup     752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003

Update order data

mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date          | order_customer_id | order_status    |
+----------+---------------------+-------------------+-----------------+
|       10 | 2013-07-25 00:00:00 |              5648 | PENDING_PAYMENT |
+----------+---------------------+-------------------+-----------------+
1 row in set (0.00 sec)

mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+--------------+
| order_id | order_date          | order_customer_id | order_status |
+----------+---------------------+-------------------+--------------+
|       10 | 2017-12-02 16:19:23 |              5648 | CLOSED       |
+----------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)

Import additional data

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id  \
--check-column order_date \
--merge-key order_id \
--incremental lastmodified \
--target-dir /user/sqoop/orders1 \
--as-textfile

Output

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
-rw-r--r--   1 cloudera cloudera          0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
-rw-r--r--   1 cloudera cloudera    2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000

Note: If we are using the same directory(orders) as the earlier it is deleting the old files and creating a new part.