1
votes

I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table

presto:sample> select * from emp_detail;

Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]

1
Did you found any solution? - OrMoush
no we ended up changing the datatype to double. if you've found anything pls share. thanks - Nizam
I've found this:stackoverflow.com/a/48931672/2277126 creating a new table from the orc table in hive and then the querying the new table form Presto returns the desired results. It works for our solution hope it will help you. if you found something else I'll be happy if you could share it. thanks. - OrMoush

1 Answers

1
votes

Please try to add this property

hive.orc.use-column-names=true

to presto-server/conf/catalog/hive.properties, and restart your presto server.

To test it without restarting the server run this from presto-cli

SET SESSION hive.orc_use_column_names=true;

Release notes from Presto regarding these attribute.