I need to write a timestamp into parquet, then read it with Hive and Impala.
In order to write it, I tried eg
my.select(
...,
unix_timestamp() as "myts"
.write
.parquet(dir)
Then to read I created an external table in Hive:
CREATE EXTERNAL TABLE IF NOT EXISTS mytable (
...
myts TIMESTAMP
)
Doing so, I get the error
HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
I also tried to replaced the unix_timestamp() with
to_utc_timestamp(lit("2018-05-06 20:30:00"), "UTC")
and same problem. In Impala, it returns me:
Column type: TIMESTAMP, Parquet schema: optional int64
Whereas timestamp are supposed to be int96. What is the correct way to write timestamp into parquet?