0
votes

I am trying to create External Hive Table on ORC File.

Query used to create the table:

create external table fact_scanv_dly_stg (
store_nbr int,
geo_region_cd char(2),
scan_id int,
scan_type char(2),
debt_nbr string,
mds_fam_id string,
upc_nbr string,
sales_unit_qty string,
sales_amt string,
cost_amt string,
visit_dt string,
rtn_amt string,
crncy_cd string,
op_cmpny_cd string)
STORED AS ORC
location 'hdfs:///my/location/scanv_data/'; 

Schema details of the ORC File(Taken it from DataFrame Spark-SQL):

 |-- _col0: integer (nullable = true)
 |-- _col1: string (nullable = true)
 |-- _col2: integer (nullable = true)
 |-- _col3: byte (nullable = true)
 |-- _col4: short (nullable = true)
 |-- _col5: integer (nullable = true)
 |-- _col6: decimal(18,0) (nullable = true)
 |-- _col7: decimal(9,2) (nullable = true)
 |-- _col8: decimal(9,2) (nullable = true)
 |-- _col9: decimal(9,2) (nullable = true)
 |-- _col10: date (nullable = true)
 |-- _col11: decimal(9,2) (nullable = true)
 |-- _col12: string (nullable = true)
 |-- _col13: string (nullable = true)

But When I am trying to do select on the created table getting below error:

select * from fact_scanv_dly_stg limit 5;


OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot 
be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable

Any suggestions please ??

1
Shouldn't the char(2) be varchar(2)? Also, something doesn't add up. Your created table does not have any decimal fields, but the schema shows some as decimal. Are these the same table?samkart
How are the data in location: 'hdfs:///my/location/scanv_data/'? What format does it have? can you do: hdfs dfs -ls hdfs:///my/location/scanv_data/ , to see what kind of format the files have?Chema

1 Answers

1
votes

This error occurs because while writing data from spark to ORC files, the datatype of columns is also preserved, and hence while creating a hive table you need to map each column with their datatype same as dataframe schema.

Here, this error occurred due to your column _col3 in spark schema being byte whereas you have specified it as a string in hive. The solution will be either to cast all the columns to the required data type in spark and then write ORC files or map exact datatype in hive schema as spark.