1
votes

I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.

Here you are my command.

sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1 

In the csv file, it shows only the age. Does anyone know how to solve it? Thanks

3
I would add more details about your case (database DDL, data, version of Sqoop, ...) - Jarek Jarcec Cecho

3 Answers

1
votes

Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.

Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument. When importing a free-form query, you must specify a destination directory with --target-dir.

If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.

Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.

For example:

$ sqoop import \
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
  --split-by a.id --target-dir /user/foo/joinresults

Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:

$ sqoop import \
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
  -m 1 --target-dir /user/foo/joinresults
1
votes

Try this:

sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
**--columns "First_Column" \**
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1 
0
votes

Whenever you are using --query parameter, you need to specify the --split-by parameter with the column that should be used for slicing your data into multiple parallel tasks. The another required parameter is --target-dir, which specifies the directory on HDFS where your data should be stored.

Solution: Try to include --split-by argument to your sqoop command and see if the error is resolved.