0
votes

I am loading tables from SQL server to HBase table using SQOOP incremental update. But the null values in SQL tables are not imported into HBase. I understand that Hbase does not support null values and the field which contains null wont be present in Hbase. But my concern is when a particular column has null values for most records is being skipped even in case when the field has a value present in it for some records. Following is the SQL table structure

   CREATE TABLE [dbo].[user_test](
[user_id] [nvarchar](20) NOT NULL,
[user_name] [nvarchar](100) NULL,
[password] [varchar](128) NULL,
[created_date] [datetime2](7) NULL,
[modified_date] [datetime2](7) NULL,
[last_login_date] [datetime2](7) NULL,
[email_id] [nvarchar](100) NULL,
[security_question_id] [int] NULL,
[answered_count] [int] NULL,
[skip_count] [int] NULL,
[role_id] [smallint] NULL,
[use_yn] [char](1) NULL,
[first_login] [char](1) NULL,
[score] [int] NULL,
[secret_answer] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED 
(
[user_id] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
  ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

In the above table, the value for email_id is null in most of the records. But even for the records for which email_id value exists, is not being imported into Hbase table. The sqoop command was successful in getting the appended records in SQL. The SQOOP command is as follows:

   sqoop import 
   --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 

   --username "hadoop" --password "Semco123" 
   --query "SELECT CAST(user_id AS Integer) as 
    user_id,user_name,password,modified_date,last_login_date,email_id,security_question_id,answered_count,skip_count,role_id,use_yn,first_login,score,secret_answer from 

    ETL_interim_DB.dbo.user_test WHERE \$CONDITIONS" 
    --hbase-table test2 
    --column-family cf 
    --hbase-row-key user_id 
    --split-by user_id -m 1 
    --incremental append 
    --check-column user_id 
    --last-value 10

But following error was displayed.

Note: Recompile with -Xlint:deprecation for details.
0    [main] ERROR org.apache.sqoop.tool.ImportTool  - Imported Failed: Can 
not create a Path from a null string

Could anyone please suggest how to import all values present in SQL server into HBase and in case of Null values in SQL what excatly happens while importing them to Hbase tables?

2
Did you tried using import control options --null-string and --null-non-string in the sqoop import?Subash
@SubashKonar : Yes I tried using import control options like --null-string 'nvl' \ --null-non-string '0' \ Still nothing worked. Could you please suggest if I have made wrong usage of null-string and null-non-string.Anon
@SubashKonar : Also are these import control option(null-string and null-non-string) applicabel while importing tables from SQL to HBase? When I went through the document it provided these options only for importing tables into Hive.Anon

2 Answers

1
votes

The COALESCE operation worked for me to import null field in SQL to HBase by giving default values. Following is the sqoop command for the same:

    sqoop import 
    --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 
    --username "hadoop" --password "Semco123" 
    --query "SELECT CAST(user_id AS Integer) as user_id
    COALESCE(user_name,'xyz') as user_name, \
    COALESCE(password,'123') as password, \
    COALESCE(created_date, '9999-12-31 00:00:00.0000000') as created_date, \
    COALESCE(modified_date,'9999-12-31 00:00:00.0000000') as modified_date, \
    COALESCE(last_login_date,'9999-12-31 00:00:00.0000000') as lastlogin, \
    COALESCE(email_id,'0') as email_id, \
    COALESCE(security_question_id,-1) as security_question_id, \
    COALESCE(answered_count,-1) as answered_count, \
    COALESCE(skip_count,-1) as skip_count, \
    COALESCE(secret_answer, '0') as secret_answer, \
    COALESCE(role_id,0) as role_id, \
    COALESCE(use_yn,'0') as use_yn, \
    COALESCE(first_login,'0') as firstlogin, \
    COALESCE(score,-1) as score from ETL_interim_DB.dbo.ms_user_detail_test WHERE \$CONDITIONS" \
   --hbase-table test2 
   --column-family cf 
   --hbase-row-key user_id 
   --split-by user_id -m 1 
   --incremental append 
   --check-column user_id 
   --last-value 10
0
votes

You can try a work around for this.For Hbase to have the columns having null values,you can update the NULL Values(empty cells) in SQL DB to have some values like '0' or text "NULL".Below is the query.

UPDATE [Table Name] SET [Column Name]='Null' WHERE [Column Name] IS NULL.

or,

ALTER TABLE [Table Name] CHANGE COLUMN [Column Name] VARCHAR(50) NOT NULL DEFAULT '';

Then try the SQL import to Hbase.Hope this helps!