Hadoop-Sqoop import without an integer value using split-by

Question

I am importing data from memsql to Hdfs using Sqoop. My source table in Memsql doesn't have any integer value, I created a new table including a new column 'test' with the existing columns.

FOllowing is the query

sqoop import --connect jdbc:mysql://XXXXXXXXX:3306/db_name --username XXXX --password XXXXX --query "select closed,extract_date,open,close,cast(floor(rand()*1000000 as int) as test from tble_name where \$CONDITIONS" --target-dir /user/XXXX--split-by test;

this query gave me following error :

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as int) as test from table_name where (1 = 0)' at line 1

I tried it another way as well:

sqoop import --connect jdbc:mysql://XXXXX:3306/XXXX --username XXXX --password XXXX --query "select closed,extract_date,open,close,ceiling(rand()*1000000) as test from table_name where \$CONDITIONS" --target-dir /user/dfsdlf --split-by test;

With the following query the job gets executed, but there is no data being transferred. It says split-by column is of float type and change it to integer type strictly.

Please help me with this to change split-by column as integer type from float type

KKG KKG · Accepted Answer · 2018-06-15T21:22:34

The problem mostly seems to be related with the use of alias as the --split-by parameter. If it's required to use the particular column in the query , you can run the query 'select closed,extract_date,open,close,ceiling(rand()*1000000) from table_name' in the console, get the column name thus coming for the table in the console and use it in --split-by 'complete_column_name_from_console' (here it should be --split-by 'ceiling(rand()*1000000)') .

Hadoop-Sqoop import without an integer value using split-by

1 Answers