Dynamic partitioning in Hive - downside of using one fixed column for partitioning

Question

We are planning to use the dynamic partitioning feature in Hive for one of our projects. I understand that this parameter needs to be setup for this to work:

hive.exec.dynamic.partition.mode=nonstrict

In our cluster this is set to strict. We are working on having this changed but in the meanwhile we were planning to do this as a work-around:

 - Create a fixed column that will always have the same hard-coded value and use this as the first static column for partitioning
 - Use the columns for dynamic partitioning after this static column

This definitely takes away the issue of setting up the above parameter. Hive just needs one static column and is happy to partition dynamically for the other columns

I noticed that, as expected, hive creates a HDFS folder with the static partition and then creates the folder for dynamic partitions under that. Something like this:

/baseDir/staticColumn=staticValue/dynamicColumn=dynamicValue1
/baseDir/staticColumn=staticValue/dynamicColumn=dynamicValue2

So the solution pushes the the actual data one level down in HDFS, which does not seem to be an issue/concern

The question I have is, is there any downside to this solution? From a performance, reliability point of view?

Forget about clumsy workarounds. You can set that parameter dynamically inside your HQL script, e.g. set hive.exec.dynamic.partition.mode=nonstrict ; insert into table X partition (PTKEY) select A, B, C, PTKEY from Z ; (unless your admin defined explicitly the param as "final" in the config file, but I can't see why he/she would do that) -- cf. cwiki.apache.org/confluence/display/Hive/Tutorial — Samson Scharfrichter

Shay Shay · Accepted Answer · 2016-12-27T18:31:50

Answering my own question in case any one is interested. I was actually using spark to load data into Hive and its as easy as adding this line of code to allow data to be inserted using dynamic partitioning

// Set hive conf to allow dynamic partitions to be created
sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

Dynamic partitioning in Hive - downside of using one fixed column for partitioning

1 Answers