How do you specify multiple columns for the key, as part of a hash subpartition?

Question

I'm creating a table with composite partitioning (RANGE, then HASH)

My actual table is enormous, so here's a stripped down dummy table that shows the important parts:

CREATE TABLE table1(
    id INT NOT NULL AUTO_INCREMENT,

    dtvalue DATETIME NOT NULL, -- Used for RANGE partitions

    v1 INT NOT NULL, -- Used for HASH partitions
    v2 INT NOT NULL,

    CONSTRAINT pk_table1 PRIMARY KEY(id),

    CONSTRAINT uk_table1 UNIQUE(v1,v2) -- Unique key for v1 and v2
)
ROW_FORMAT=COMPRESSED
PARTITION BY RANGE(dtvalue)
SUBPARTITION BY HASH(v1,v2) SUBPARTITIONS 32 -- Does this HASH subpartition work?
(PARTITION p20191101 VALUES LESS THAN('2019-11-01'),
 PARTITION p20191102 VALUES LESS THAN('2019-11-02'),
 PARTITION pMax VALUES LESS THAN MAXVALUE);

This table will frequently be joined by v1 and v2 combined as a unique identifier, so using both columns for the HASH key is critical.

My question is, can you specify multiple columns as part of the HASH key?

Or can you specify the unique key itself, for the HASH key?
e.g. SUBPARTITION BY HASH(uk_table1) SUBPARTITIONS 32

@RickJames While I completely understand your sentiment, considering most general queries would scan all hash partitions, rendering the partitions pointless. However, well-structured queries should always include the hash partition key in the WHERE and/or ORDER BY clauses, reducing the number of partitions that are scanned. Hence the reasoning behind requiring the hash partition key to be part of the primary key as well. I intend to utilize this properly. But yes, in most fast-paced willy-nilly (badly-designed) database scenarios, hash partitioning would certainly do more harm, than good — Giffyguy
Well structured indexes are as fast, or faster, than using partitioning. Think of it this way: Picking the partition costs about as much as shrinking the depth of a BTree index. — Rick James
@RickJames Interesting. Is your suggestion affected by the size of the tables involved? I'm looking at the potential of billions of rows being added regularly (with any luck, daily) and I don't want the size of my tables to get out of control. I'm using partitions to keep frequently-accessed rows grouped together in a smaller partition, and older less-frequently-accessed rows in the background and out of the way. I'd like to take your challenge, in your comment below - but would we need billions of rows of sample data to make it accurate? That's a lot of disk space for a challenge ... — Giffyguy
Part of the challenge involves my knowing what column(s) indicate "commonly used". I don't need any data to present my 'solution'. — Rick James

Gordon Linoff Gordon Linoff · Accepted Answer · 2019-11-13T17:33:47

You may want to rethink your approach concerning partitioning. MySQL is quite finicky -- so primary keys and unique constraints need to include the partitioning keys.

And, the hash partitioning takes an integer, but has a very limited repertoire of allowed explicit functions. Happily, you can get around that with a stored generated column.

So, the following accomplishes your stated goal:

CREATE TABLE table1 (
    id INT NOT NULL AUTO_INCREMENT,

    dtvalue datetime NOT NULL, -- Used for RANGE partitions

    v1 INT NOT NULL, -- Used for HASH partitions
    v2 INT NOT NULL,

    CONSTRAINT pk_table1 PRIMARY KEY(id, dtvalue, v1_v2),

    v1_v2 int generated always as (cast(conv(left(md5(concat_ws(':', v1, v2)), 8), 16, 10) as unsigned)) stored
)
ROW_FORMAT=COMPRESSED
PARTITION BY RANGE COLUMNS (dtvalue)
SUBPARTITION BY HASH (v1_v2) SUBPARTITIONS 32 -- Does this HASH subpartition work?
(PARTITION p20191101 VALUES LESS THAN ('2019-11-01'),
 PARTITION p20191102 VALUES LESS THAN ('2019-11-02'),
 PARTITION pMax VALUES LESS THAN MAXVALUE);

How do you specify multiple columns for the key, as part of a hash subpartition?

2 Answers