I am creating an external table that refers to ORC files in an HDFS location. That ORC files are stored in such a way that the external table is partitioned by date (Mapping to date wise folders on HDFS, as partitions). However, I am wondering if I can enforce 'Bucketing' on these external tables because the underlying data/files are not 'managed' by hive. They are written externally and hence can bucketing be used in Hive External Tables?
Hive is allowing me to use the 'CLUSTERED BY' clause while creating an external table. But I am not able to understand how hive will redistribute the data into buckets, what is already written on HDFS as ORC files?
I have seen similar questions on PARTITION AND BUCKETING in External tables here:
Hive: Does hive support partitioning and bucketing while usiing external tables
and
Can I cluster by/bucket a table created via "CREATE TABLE AS SELECT....." in Hive?
but the answers are talking only about Partition support in external tables or bucket support in MANAGED tables. I am aware of both those options and am already using it but need specific answers about bucketing support in Hive EXTERNAL tables.
So, In summary, Do Hive External Tables support bucketing? If yes, how is the data in the external folder redistributed into buckets by hive?