0
votes

How to Sqoop import from Oracle to Hive in compressed table

Tried Sqoop import from Oracle to Hive in text partitioned table and it worked. When trying to do the same in a compressed (SNAPPY, ZLIB, BZIP2) table, while compressing the data with Sqoop with the corresponding codec, it creates the files. But when trying to select thru Hive, I get the error: Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file maprfs:///envir.../2019/4/20190416/part-m-00000.snappy. Invalid postscript. (state=,code=0)

The tests go like this: - Create table - modify sqoop option file - run Sqoop - Alter table .. add partition - try select , which results in error

SQOOP: --delete-target-dir --target-dir "/environ..3/2019/4/20190416" --compression-codec "org.apache.hadoop.io.compress.BZip2Codec" --hive-table "...tmp_member3" --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="BZIP2")' --hive-partition-key "'year','month','day'" --hive-partition-value "'2019','4','20190416'"

HIVE: CREATE EXTERNAL TABLE xx.tmp_member3 (a STRING,kpi_name STRING, b STRING, c STRING) PARTITIONED BY (YEAR INT,MONTH INT,DAY INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC LOCATION '/envir../2019/4/20190416' TBLPROPERTIES ('orc.compress'='BZIP2');

I have already tried the 'set hive.exec.orc.split.strategy=BI', with no success.

1

1 Answers

1
votes

I would personally work this out in 2 steps.

1st would be getting the data from Oracle to a normal hive text format stage table and then from stage to hive orc table with compress format as per your requirement.