5
votes

The Null values are displayed as '\N' when a hive external table is queried.

Below is the sqoop import script:

sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \ --connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $user --password $pwd --query "

select col1,col2,col3 from $db.xxx

where \$CONDITIONS" \ --null-string '\N' --null-non-string '\N' \ --fields-terminated-by '\t' --num-mappers 6 \ --split-by job_number \ --delete-target-dir \ --target-dir $hdfs_loc

Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.

4

4 Answers

7
votes

Sathiyan- Below are my findings after many trials

  1. If (null string) property is not included during sqoop import, then NULLs are stored as [blank for integer columns] and [blank for string columns] in HDFS. 2.If the HIVE table on top of HDFS is queried, we would see [NULL for integer column] and [blank for String columns]
  2. If the (--null-string '\N') property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].
  3. If the HIVE table on top of HDFS is queried, we would see [NULL for both integer and string columns not '\N']
5
votes

In your sqoop script you mentioned --null-string '\N' --null-non-string '\N which means,

--null-string '\N'  = The string to be written for a null value for string columns 

--null-non-string '\N' = The string to be written for a null value for non-string columns
4
votes

If any value is NULL in the table and we want to sqoop that table ,then sqoop will import NULL value as string null in HDFS. So, that will create problem to use Null condition in our query using hive

For example: – Lets insert NULL value to mysql table “cities”.

mysql> insert into cities values(6,7,NULL);

By default, Sqoop will import NULL value as string null in HDFS.

Lets sqoop and see what happens:–

sqoop import –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table cities –hive-import –hive-overwrite –hive-table vikas.cities -m 1

http://deltafrog.com/how-to-handle-null-value-during-sqoop-import-export/

0
votes

In The sqoop import command remove the --null-string and --null-non-string '\N' option. by default system will assign null for both strings and non string values.

I have tried --null-string '\N' and --null-string '' and other options but getting blank and different issues.