sqoop import query only import the first column

Question

I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.

Here you are my command.

sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1

In the csv file, it shows only the age. Does anyone know how to solve it? Thanks

I would add more details about your case (database DDL, data, version of Sqoop, ...) — Jarek Jarcec Cecho

Karthik Karthik · Accepted Answer · 2015-07-23T02:48:23

Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.

Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument. When importing a free-form query, you must specify a destination directory with --target-dir.

If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.

Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.

For example:

$ sqoop import \
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
  --split-by a.id --target-dir /user/foo/joinresults

Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:

$ sqoop import \
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
  -m 1 --target-dir /user/foo/joinresults

sqoop import query only import the first column

3 Answers