- Mode: append can be used for columns where you know the last value.
- Mode: lastmodified mode can be used for timestamp column. It may be hard to remember the last modified timestamp.If you know the last modified timestamp you can go head with the first approach.
Mode: Append
mysql> describe emp;
+--------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(100) | YES | | NULL | |
| deg | varchar(100) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| dep | varchar(10) | YES | | NULL | |
+--------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Import data using sqoop command.
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
-m 3 \
--table emp \
--split-by id \
--columns id,name,deg \
--warehouse-dir /user/sqoop/ \
--delete-target-dir \
--as-textfile
HDFS output
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
mysql> select * from emp;
+------+---------+--------------+--------+------+
| id | name | deg | salary | dep |
+------+---------+--------------+--------+------+
| 1201 | gopal | manager | 50000 | tp |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | php dev | TECH WRITER | 50000 | AC |
| 1204 | Nilesh | Domino dev | 70000 | AF |
| 1205 | Vinayak | Java dev | 50000 | IT |
| 1206 | Amish | Cog dev | 60000 | IT |
| 1207 | Jatin | Oracel dev | 40001 | IT |
| 1208 | Viren | Java dev | 70004 | IT |
| 1209 | Ashish | Oracel dev | 40001 | IT |
| 1210 | Satish | Java dev | 70004 | IT |
+------+---------+--------------+--------+------+
10 rows in set (0.00 sec)
Insert new record into table.
mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
Query OK, 1 row affected (0.03 sec)
mysql> select * from emp;
+------+---------+--------------+--------+------+
| id | name | deg | salary | dep |
+------+---------+--------------+--------+------+
| 1201 | gopal | manager | 50000 | tp |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | php dev | TECH WRITER | 50000 | AC |
| 1204 | Nilesh | Domino dev | 70000 | AF |
| 1205 | Vinayak | Java dev | 50000 | IT |
| 1206 | Amish | Cog dev | 60000 | IT |
| 1207 | Jatin | Oracel dev | 40001 | IT |
| 1208 | Viren | Java dev | 70004 | IT |
| 1209 | Ashish | Oracel dev | 40001 | IT |
| 1210 | Satish | Java dev | 70004 | IT |
| 1211 | Jag | be | 20000 | IT |
+------+---------+--------------+--------+------+
11 rows in set (0.00 sec)
Command for incremental import
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table emp \
--split-by id \
--check-column id \
--incremental append \
--last-value 1210 \
--warehouse-dir /user/sqoop/ \
--as-textfile
After importing
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
-rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
1211,Jag,be,20000,IT
Mode: lastmodified
mysql> describe orders;
+-------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+----------------+
| order_id | int(11) | NO | PRI | NULL | auto_increment |
| order_date | datetime | NO | | NULL | |
| order_customer_id | int(11) | NO | | NULL | |
| order_status | varchar(45) | NO | | NULL | |
+-------------------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
Import the order into hdfs
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id \
--target-dir /user/sqoop/orders \
--as-textfile
After import
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
-rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
-rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
-rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
Update order data
mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date | order_customer_id | order_status |
+----------+---------------------+-------------------+-----------------+
| 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
+----------+---------------------+-------------------+-----------------+
1 row in set (0.00 sec)
mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+--------------+
| order_id | order_date | order_customer_id | order_status |
+----------+---------------------+-------------------+--------------+
| 10 | 2017-12-02 16:19:23 | 5648 | CLOSED |
+----------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)
Import additional data
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id \
--check-column order_date \
--merge-key order_id \
--incremental lastmodified \
--target-dir /user/sqoop/orders1 \
--as-textfile
Output
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
-rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
Note: If we are using the same directory(orders) as the earlier it is deleting the old files and creating a new part.