0
votes

I am using prestodb with hive metastore for schema storage and alluxio cache as external storage for data. The storage format used in alluxio and hive schema is PARQUET. While retrieving timestamp field from presto using hive catalog. I get follow error.

The column utdate is declared as type timestamp, but the Parquet file declares the column as type INT64

Schema for dataset is

create table test( utcdate timestamp ) WITH ( format='PARQUIET', external_location='alluxio://path-toserver/directory-path-to-parquet' )

This issue is produced after upgrade from prestodb v0.221 to v0.226 version.

2
I suggest you switch to PrestoSQL(prestosql.io), which doesn't have this problem.Dain Sundstrom

2 Answers

1
votes

I use this while saving parquet to solve timestamp issue.

pyarrow.parquet.write_table(self.arrow_table, file_name, use_deprecated_int96_timestamps=True)

The presto and spark uses int96 timestamp value to store timestamp in nanoseconds. The default timestamp format for pyarrow table is int64, while presto consider int64 as bigint data type. If we use deprecated version of timestamp int96 in pyarrow we will get timestamp value in presto compatible format.

0
votes

It's not clear what you're asking about, so I assume the question is: how to read the data.

There is no switch or configuration change that you can apply to read your data while in .226 version. Thus you have currently two options: