My requirement is to load the data from RDBMS into HDFS (backed by CDH 5.9.X) via sqoop (1.4.6) in avro format and then use an external hive(1.1) table to query the data. Unfortunately, the data in RDBMS has some new line characters.
We all know that hive can't parse new line character in the data and the data mapping fails when selected the whole data via hive. However, hive's select count(*) works fine.
I used below options during sqoop import and checked, but didn't work:
--hive-drop-import-delims
--hive-delims-replacement
The above options work for text format. But storing data in text format is not a viable option for me. The above options are converted properly in the Sqoop generated (codegen) POJO class's toString method (obviously as text format is working as expected), so I feel this method is not at all used during avro import. Probably because avro has not problem dealing with new line character, where as hive has.
I am surprised, don't anyone face such a common scenario, a table which has remark, comment field is prone to this problem.
Can anyone suggest me a solution please?
My command:
sqoop import \
-Dmapred.job.queue.name=XXXX \
--connect jdbc:oracle:thin:@Masked:61901/AgainMasked \
--table masked.masked \
--username masked \
--P \
--target-dir /user/masked/ \
--as-avrodatafile \
--map-column-java CREATED=String,LAST_UPD=String,END_DT=String,INFO_RECORD_DT=String,START_DT=String,DB_LAST_UPD=String,ADDR_LINE_3=String\
--hive-delims-replacement ' '
--null-string '\\N'
--null-non-string '\\N'
--fields-terminated-by '\001'
-m 1
--hive-import --hive-drop-import-delims --map-column-hive- Sandeep Singh