2
votes

I am trying to export data from HDFS to SQL Server. The original table has over 500 columns and every time I execute Sqoop export job it gets stuck showing mapreduce completed at 100%. I created two dummy tables as shown below to find out where the exact problem persists. The only difference between table1 and table2 is that the later has one additional column [col14 varchar(5)]

First, I ran export job for Table1 which has 13 columns [datatype varchar (5)]. The job completed successfully and exported all the 3 records to the SQL Server.

Next, I executed the export job for Table2 with 14 columns. When I ran this job, I didn’t see any error messages/exceptions, but it hangs forever after map completed at 100%. The SQL Server Activity Monitor shows a process is being created however it’s not receiving any data/prepared statement from Hadoop.

Is this problem exists only with SQL Server? Is there any limitation on the number of columns exported to SQL Server? Do I need to tweak any configuration changes in my cluster? Please advise.

Configuration

Hadoop Version – Cloudera 2.6.0-CDH-5.5.2 | Sqoop Version – 1.4.6 | SQL Server Version – 2008 R2

6 Node Cluster - 1 NN & 5DN | Map Task - 2 GB/1vCPU | Reduce Task - 2GB/1vCPU

Table1

CREATE TABLE [dbo].[tbldummy1]
(
[col1] [varchar] (5) NOT NULL,
[col2] [varchar](5) NULL,
[col3] [varchar](5) NULL,
[col4] [varchar](5) NULL,
[col5] [varchar](5) NULL,
[col6] [varchar](5) NULL,
[col7] [varchar](5) NULL,
[col8] [varchar](5) NULL,
[col9] [varchar](5) NULL,
[col10] [varchar](5) NULL,
[col11] [varchar](5) NULL,
[col12] [varchar](5) NULL,
[col13] [varchar](5) NULL,
CONSTRAINT [PK_dummy1] PRIMARY KEY ([col1] ASC))

Sqoop Command for Table1

  sqoop export \
--connect “jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx” \
--username xxxxxx --password yyyyyy \
--table tbldummy1 \
--export-dir /user/hue/Out1 \
--input-fields-terminated-by '|' \
-m 1 \
--verbose

Input Data for Table 1

aa|01|02|03|04|05|06|07|08|09|10|11|12
bb|01|02|03|04|05|06|07|08|09|10|11|12
cc|01|02|03|04|05|06|07|08|09|10|11|12

Table 2

CREATE TABLE [dbo].[tbldummy2](
    [col1] [varchar] (5) NOT NULL,
    [col2] [varchar](5) NULL,
    [col3] [varchar](5) NULL,
    [col4] [varchar](5) NULL,
    [col5] [varchar](5) NULL,
    [col6] [varchar](5) NULL,
    [col7] [varchar](5) NULL,
    [col8] [varchar](5) NULL,
    [col9] [varchar](5) NULL,
    [col10] [varchar](5) NULL,
    [col11] [varchar](5) NULL,
    [col12] [varchar](5) NULL,
    [col13] [varchar](5) NULL,
    [col14] [varchar](5) NULL,
 CONSTRAINT [PK_dummy2] PRIMARY KEY ([col1] ASC))

Sqoop Command for Table 2

sqoop export \
--connect "jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx" \
--username xxxxxx --password yyyyyy \
--table tbldummy2 \
--export-dir /user/hue/Out2 \
--input-fields-terminated-by '|' \
-m 1 \
--verbose

Input data for Table 2

aa|01|02|03|04|05|06|07|08|09|10|11|12|13
bb|01|02|03|04|05|06|07|08|09|10|11|12|13
cc|01|02|03|04|05|06|07|08|09|10|11|12|13

Console logs for Table 2

16/03/16 23:35:01 INFO mapreduce.Job: Running job: job_1458150283440_0028
16/03/16 23:35:07 INFO mapreduce.Job: Job job_1458150283440_0028 running in uber mode : false
16/03/16 23:35:07 INFO mapreduce.Job:  map 0% reduce 0%
16/03/16 23:35:18 INFO mapreduce.Job:  map 100% reduce 0%
3

3 Answers

2
votes

We faced the same issue on our end - sqoop export to a table in SQL Server reached 100% and then it just hung until timeout period of 10 mins was reached, after which the job was failed. In our investigation, we found that the reason for this was, in fact, composite primary key violation on SQL Server side, for which we did not have any visibility on our hadoop cluster end. Once we resolved this PK violation, sqoop export completed successfully.

I would also like to point out that access privilege was not the issue and we tested this by successfully running insert via sqoop eval, which completed with no issues.

As the next step on your end, I would recommend that you first test your write access privilege by running sqoop eval. Once you confirmed that you are able to insert records in your target table via sqoop eval, go ahead and list out all the constraints that your target table in SQL Server enforces and then add the appropriate logic in your data lake to prevent such records from being exported to SQL Server. If you can make sure that data you are exporting to SQL Server does not violate any constraints on the SQL Server side, your sqoop export issue should be resolved. Let us know if this does not resolve the issue you are facing.

0
votes

It seems that you have problems with the privileges of your xxxxxx user in your xxxxxxx database. In export operation after map phase the job tries to execute the insert-update query, but if it does not have nessecary permissions for the username, it can get stuck. Try to assign db_writer role to your user. Another option, if it is possible, try to execute operation under sa account just to understand if it is that case.

0
votes

Your error log is not showing much stack, to understand the error I would suggest checking the yarn logs for the failed nodes.

Before you check for the issues on SQL Server side, I've tweaked your sqoop job below try making those changes, I'm pretty sure it will solve the issue you are facing.

#Changes Made - #Increase the number of mappers to 8 or 10 for faster process
#columns mapping - You have to give your column names in SQL server table in the sequence to match with your file 

sqoop export \
--connect "jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx" \
--username xxxxxx --password yyyyyy \
--table tbldummy2 \
--export-dir /user/hue/Out2 \
--input-fields-terminated-by '|' \
-m <increase to higher number depending on your cluster> \
--columns "col1,col2,col2"
--verbose