I am trying to export data from HDFS to SQL Server. The original table has over 500 columns and every time I execute Sqoop export job it gets stuck showing mapreduce completed at 100%. I created two dummy tables as shown below to find out where the exact problem persists. The only difference between table1 and table2 is that the later has one additional column [col14 varchar(5)]
First, I ran export job for Table1 which has 13 columns [datatype varchar (5)]. The job completed successfully and exported all the 3 records to the SQL Server.
Next, I executed the export job for Table2 with 14 columns. When I ran this job, I didn’t see any error messages/exceptions, but it hangs forever after map completed at 100%. The SQL Server Activity Monitor shows a process is being created however it’s not receiving any data/prepared statement from Hadoop.
Is this problem exists only with SQL Server? Is there any limitation on the number of columns exported to SQL Server? Do I need to tweak any configuration changes in my cluster? Please advise.
Configuration
Hadoop Version – Cloudera 2.6.0-CDH-5.5.2 | Sqoop Version – 1.4.6 | SQL Server Version – 2008 R2
6 Node Cluster - 1 NN & 5DN | Map Task - 2 GB/1vCPU | Reduce Task - 2GB/1vCPU
Table1
CREATE TABLE [dbo].[tbldummy1]
(
[col1] [varchar] (5) NOT NULL,
[col2] [varchar](5) NULL,
[col3] [varchar](5) NULL,
[col4] [varchar](5) NULL,
[col5] [varchar](5) NULL,
[col6] [varchar](5) NULL,
[col7] [varchar](5) NULL,
[col8] [varchar](5) NULL,
[col9] [varchar](5) NULL,
[col10] [varchar](5) NULL,
[col11] [varchar](5) NULL,
[col12] [varchar](5) NULL,
[col13] [varchar](5) NULL,
CONSTRAINT [PK_dummy1] PRIMARY KEY ([col1] ASC))
Sqoop Command for Table1
sqoop export \
--connect “jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx” \
--username xxxxxx --password yyyyyy \
--table tbldummy1 \
--export-dir /user/hue/Out1 \
--input-fields-terminated-by '|' \
-m 1 \
--verbose
Input Data for Table 1
aa|01|02|03|04|05|06|07|08|09|10|11|12
bb|01|02|03|04|05|06|07|08|09|10|11|12
cc|01|02|03|04|05|06|07|08|09|10|11|12
Table 2
CREATE TABLE [dbo].[tbldummy2](
[col1] [varchar] (5) NOT NULL,
[col2] [varchar](5) NULL,
[col3] [varchar](5) NULL,
[col4] [varchar](5) NULL,
[col5] [varchar](5) NULL,
[col6] [varchar](5) NULL,
[col7] [varchar](5) NULL,
[col8] [varchar](5) NULL,
[col9] [varchar](5) NULL,
[col10] [varchar](5) NULL,
[col11] [varchar](5) NULL,
[col12] [varchar](5) NULL,
[col13] [varchar](5) NULL,
[col14] [varchar](5) NULL,
CONSTRAINT [PK_dummy2] PRIMARY KEY ([col1] ASC))
Sqoop Command for Table 2
sqoop export \
--connect "jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx" \
--username xxxxxx --password yyyyyy \
--table tbldummy2 \
--export-dir /user/hue/Out2 \
--input-fields-terminated-by '|' \
-m 1 \
--verbose
Input data for Table 2
aa|01|02|03|04|05|06|07|08|09|10|11|12|13
bb|01|02|03|04|05|06|07|08|09|10|11|12|13
cc|01|02|03|04|05|06|07|08|09|10|11|12|13
Console logs for Table 2
16/03/16 23:35:01 INFO mapreduce.Job: Running job: job_1458150283440_0028
16/03/16 23:35:07 INFO mapreduce.Job: Job job_1458150283440_0028 running in uber mode : false
16/03/16 23:35:07 INFO mapreduce.Job: map 0% reduce 0%
16/03/16 23:35:18 INFO mapreduce.Job: map 100% reduce 0%