0
votes

I'm using Sqoop to Import data from SQL Server into Hive and then later Export that data out of Hive into another SQL Server. The Sqoop Import works fine and converts VCHAR/NVARCHAR data type into String.

My question is what is the best column type to define on the Target table, since Hive now currently holds the data type as String? I originally defined most my columns on the Target table as VARCHAR(100) and it has been working, but now some String it failed during the Export and I get:

SQL State: 22001, error code: 8152

"java.sql.BatchUpdateException: String or binary data would be truncated."

Sample string failed:

"HEALTH SITE PROVIDERS LLC"|" "|"3435673"|"UHGID0000547777"|"906225"|"\\N"|"\\N"|"\\N"

Clearly this data has far less characters than 100 for each column (column delimited by |), So I'm confused as to how Hive/Sqoop is converting this String or does it do any conversion at all during the Export?

I was thinking of defining my columns in the Target table as NVARCHAR(max) but is this a bit extreme? Also I need to have some columns Index as well and NVARCHAR(max) isn't allowed in SQL Server.

Regards,

1

1 Answers

1
votes

Since you mostly data is of type VARCHAR(100). There is no need to store it is Hive's STRING. You can save VARCHAR and NVARCHAR in Hive's VARCHAR.

Use --map-column-hive <column-name,hive-type....> in your sqoop import command.

Example:

Say col1 is VARCHAR(100) and col2 is NVARCHAR(100)

--map-column-hive col1='varchar(100)',col2='varchar(100)',....


Now you can export it back to SQL Server table having columns VARCHAR/NVARCHAR.