0
votes

I have 1 parquet data file with schema;

  • id integer
  • model binary

This file was created using pyspark and consist model identifier and dumped with pickle python library model binary.

Is it possible to create Hive external table for this parquet file and get output after select command. Let's suppose that Hive external table got exactly same schema.

CREATE EXTERNAL TABLE default.t_model
(
id integer
, model binary
)
STORED AS PARQUET
LOCATION 'hdfs_path';

I'd done each step above but always got empty answerset. Should I use Hive UDF for loading binary column? Or should I try another data type for parquet binary column like array?

Appreciate any answers, thx.

1

1 Answers

0
votes

Looks like I shouldn't use partitioned table without MSCK REPAIR TABLE command. With Hive binary data type everything works good.