0
votes

we have 2 clusters one Map R and another our own. We want created new setup in our own hardware using the Map R data.

  1. I have copied all the orc files from the Map R cluster and followed the same folder structure
  2. Created a orc formatted table with location of #1
  3. then executed this command "MSCK REPAIR TABLE <>"

above steps passed without error, but when i query the partitions then job fails with below error

java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 4958903
    at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193)
    at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)

Can some one tell me can we create HIVE ORC partition tables directly from the orc files?

My storage is Azure data lake.

1

1 Answers

0
votes

According to your description, based on my understanding, I think you want to copy all orc files from a cluster to another and load these orc files as a hive table.

For doing it, please just try to follow the command below to create external table for loading orcfile data.

CREATE EXTERNAL TABLE IF NOT EXSISTS <table name> (<column_name column_type>, ...)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
    STORED AS ORC 
    LOCATION '<orcfile path>'

If not aware of the columns list of an orc file, you can refer to the Hive manual ORC File Dump Utility to print the ORC file metadata in JSON format via hive --orcfiledump -j -p <location-of-orc-file-or-directory>.