5
votes

I am working with hive, I needed to create a table with 'n' normal column and 100 or more as partition columns and I am able to create that table successfully. now When I come to load that table with data of another table with same schema and all columns are non-partition columns, I am getting error like this:

Failed with exception MetaException(message:Attempt to store value Failed with exception MetaException(message:Attempt to store value "c1=v1/c2=v2/c3=v3/....c100=v100" in column "PART_NAME" that has maximum length of 767. Please correct your data!)

By taking last line of error in consideration, I tried to reduce the column name and their values, so that the resultant partition path will get shorter and it worked!! but it should not be like that in real time scenario size of column name and their values could be anything and so of partition path.

e.g. Here is my create table Query:

CREATE TABLE xyz( c0 int) PARTITIONED BY ( c1 String,c2 String,c3 String,c4 String.......c100 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE

And here is my insert into query:

INSERT INTO TABLE xyz PARTITION (gc1,c2,c3....,c100) SELECT c0,c1,c2,c3,c4....,c100 FROM table123;

Am I doing something wrong or should I have to set some properties to make use of so many partitions like 100 or more?
Please give me any clue I am stuck on this.
Thanks

1
Hi vaijnath, I am facing this issue too, Any helpful information is appreciated. - Hemant
This is an abuse of data modeling. You are practically going to store each record in a different partition/folder - David דודו Markovitz
@Dudu-markovitz I have given this query as an example, i may have to work on table with 1000's of columns in that case the 100 partition columns is not something strange. - Vaijnath Polsane
Yes it is. This is a complete abuse. - David דודו Markovitz
(1) 10K+ partitions(!), not columns. (2) Generosity is when you are preventing someone from doing terrible mistakes because a lack of basic understanding, not when you are allowing him to do what ever he wants and leaves him to deal with the consequences by himself. (3) Given a set of columns where each column has only 2 possible values, do you understand what is the potential number of combinations for 10 / 20 / 30 columns? - David דודו Markovitz

1 Answers

1
votes

I agreed with the experts that we should not go for so many partitions in a table.

Also I would like to quote this as most of the nodes are unix/linux based and we can not create folder or file name having length greater than 255 bytes. That may be the reason you are getting this error, as partition is a folder only.

Linux has a maximum filename length of 255 characters for most filesystems (including EXT4), and a maximum path of 4096 characters. eCryptfs is a layered filesystem.