Hive, Bucketing for the partitioned table

Question

This is my script:

--table without partition

drop table if exists ufodata;
create table ufodata ( sighted string, reported string, city string, shape string, duration string, description string )
row format delimited
fields terminated by '\t'
Location '/mapreduce/hive/ufo';

--load my data in ufodata

load data local inpath '/home/training/downloads/ufo_awesome.tsv' into table ufodata;

--create partition table
drop table if exists partufo;
create table partufo ( sighted string, reported string, city string, shape string, duration string, description string )
partitioned by ( year string )
clustered by (year) into 6 buckets
row format delimited
fields terminated by '/t';

--by default dynamic partition is not set
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--by default bucketing is false
set hive.enforcebucketing=true;

--loading mydata
insert overwrite table partufo
partition (year)
select sighted, reported, city, shape, min, description, SUBSTR(TRIM(sighted), 1,4) from ufodata;

Error message:

FAILED: Error in semantic analysis: Invalid column reference

I tried bucketing for my partitioned table. If I remove "clustered by (year) into 6 buckets" the script works fine. How do I bucket the partitioned table

madhu madhu · Accepted Answer · 2015-10-15T10:30:00

There is an important thing we should consider while doing bucketing in hive.

The same column name cannot be used for both bucketing and partitioning. The reason is as follows:

Clustering and Sorting happens within a partition. Inside each partition there will be only one value associated with the partition column(in your case it is year)therefore there will not any be any impact on clustering and sorting. That is the reason for your error....

Hive, Bucketing for the partitioned table

3 Answers