How to Sqoop import from Oracle to Hive in compressed table
Tried Sqoop import from Oracle to Hive in text partitioned table and it worked. When trying to do the same in a compressed (SNAPPY, ZLIB, BZIP2) table, while compressing the data with Sqoop with the corresponding codec, it creates the files. But when trying to select thru Hive, I get the error: Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file maprfs:///envir.../2019/4/20190416/part-m-00000.snappy. Invalid postscript. (state=,code=0)
The tests go like this: - Create table - modify sqoop option file - run Sqoop - Alter table .. add partition - try select , which results in error
SQOOP: --delete-target-dir --target-dir "/environ..3/2019/4/20190416" --compression-codec "org.apache.hadoop.io.compress.BZip2Codec" --hive-table "...tmp_member3" --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="BZIP2")' --hive-partition-key "'year','month','day'" --hive-partition-value "'2019','4','20190416'"
HIVE: CREATE EXTERNAL TABLE xx.tmp_member3 (a STRING,kpi_name STRING, b STRING, c STRING) PARTITIONED BY (YEAR INT,MONTH INT,DAY INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC LOCATION '/envir../2019/4/20190416' TBLPROPERTIES ('orc.compress'='BZIP2');
I have already tried the 'set hive.exec.orc.split.strategy=BI', with no success.