MySQL - Uneven Distribution of Data into Partitions When Using Key Partitioning

Question

I'm using the InnoDB engine on MySQL 5.7.
I have a table where one of the columns is a (non-unique) three-letter country code (e.g. "SGP" for Singapore, "JPN" for Japan, etc).
For most of my queries, this country code column is the first WHERE clause I filter by (e.g. WHERE COUNTRY_CODE = 'SGP')
Hence, I want to (sub-)partition the table by this column. Since most of my queries will be on a single country code, they will only hit one partition in this way.
However, due to the large number of different country codes, I do not want to use LIST partitioning where I have to explicitly cater for every single country code.
So I used KEY partitioning, with 8 partitions. I thought that key partitioning, whereby the value is hashed, would give me a more-or-less even distribution over the 8 partitions (don't have to be perfect).
However, what I experienced is that out of the 8 partitions, 4 of them are completely untouched.

This is an abstract of my CREATE TABLE statement:

CREATE TABLE TBL_EATING_PLACES (
    ID INT UNSIGNED AUTO_INCREMENT NOT NULL,
    TYPE_OF_FOOD SMALLINT UNSIGNED NOT NULL,
    SUBTYPE_OF_FOOD SMALLINT UNSIGNED NOT NULL,
    COUNTRY_CODE CHAR(3) NOT NULL,
    ADDRESS VARCHAR(255),
    ...
    OTHER_NON_RELEVANT_COLUMNS ...,
    ...,
    CONSTRAINT PKEY PRIMARY KEY (ID, TYPE_OF_FOOD, SUBTYPE_OF_FOOD, COUNTRY_CODE)
)
ENGINE = InnoDB
PARTITION BY LIST COLUMNS(TYPE_OF_FOOD, SUBTYPE_OF_FOOD) SUBPARTITION BY KEY(COUNTRY_CODE) SUBPARTITIONS 8 (
    PARTITION P_1_1 VALUES IN ((1, 1)),
    PARTITION P_1_2 VALUES IN ((1, 2)),
    PARTITION P_2_1 VALUES IN ((2, 1)),
    PARTITION P_1_2 VALUES IN ((2, 2)),
    PARTITION P_1_3 VALUES IN ((2, 2)),
);

Is there anything wrong with how I'm doing the KEY partitioning such that it ended up only hitting half of the partitions?

But does it actually work that way? If I have understood your partitions correctly, you will actually need to read multiple partitions where you query by the country code (because tables are first partitioned by TYPE_OF_FOOD, SUBTYPE_OF_FOOD) — e4c5
yes, I'm sorry, I simplified and zoomed into the problem. All my queries will filter first by TYPE_OF_FOOD and SUBTYPE_OF_FOOD (this is implicit, and the frontend user won't even get to choose). And then, there is a choice to filter by COUNTRY_CODE, which for most of my use, I usually do. I didn't elaborate on the (primary) partitioning because I observed that to be working well (i.e. data goes into the correct partition). — Edwin Lee
just a thought, I'm not sure how MySQL does the hashing, and not sure if it's relevant... but does "lexical closeness" affect whether two values are hashed into the same partition? (e.g. Switzerland (CHE) and China (CHN), and Australia (AUS) and Austria (AUT)), and anyway, is there any way to see which rows reside in which partitions? — Edwin Lee

Rick James Rick James · Accepted Answer · 2016-07-19T20:22:52

What's wrong with key partitioning? It provides zero benefit. Don't use it. Instead, provide suitable composite indexes that match your queries.

(Added to address questions in comments...)

Usually a composite index can do the equivalent of what partitioning does. The "partition key" does "partition pruning" to pick the one (or few) partitions to look into. By having the the "partition key" as the first column in an index, you get the same effect. (Yes, there are exceptions.)

Partitions have some overhead. Each partition is a file; opening files is costly. In some cases, all partitions are opened before doing the pruning. It used to be that there was no pruning on INSERT. (Yuck!) (Some of these issues have been addressed in newer version, but there is still some overhead.)

I have looked at many examples of subpartitioning and non-RANGE partitioning. I have seen only 4 cases where indexing won't do "as good" as partitioning. I assume you found my blog that lists the 4. Here's one copy: Partition Maintenance.

A 2-dimensional search needs to "reduce the search space". This is one of the 4 cases. RANGE partitioning handles one dimension, the PRIMARY KEY handles the other. That works efficiently (but with messy code) to Find the 10 nearest pizza parlors.

BY RANGE is the only partitioning that can handle a 'range' of values (such as a date range). HASH will simply search all the partitions.

BY LIST may be just as good as BY RANGE, but only for exact values. And then I go back to saying "why not put the partition key on the front of whatever index you would otherwise use"!

I will happily augment my blog if someone can find a 5th use case for which I can't provide equivalent performance without partitioning.

MySQL - Uneven Distribution of Data into Partitions When Using Key Partitioning

1 Answers