6
votes

We have a huge InnoDB table with hundreds of millions of rows and only 3 columns: GUID, enum, smallint. All lookups are done by GUID.

We're considering making GUID the PK and partitioning it BY KEY.

We've heard that using GUID as PK is bad because of its random distribution and the fact that PK creates clustered index. So storing rows in random order of GUIDs increases fragmentation and page splits.

The alternative to using GUID as PK is to create a surrogate auto-increment key, and use that as PK. However, if we want to partition the table by GUID, that GUID has to be part of PK as well. Also, since all queries are done by GUID, we need an additional GUID index. That index essentially maps GUID->PK, while if we use GUID as PK - the table itself maps GUID->enum+small int?

So my question is whether we gain anything by adding auto-inc PK and having additional GUID index?

Thanks, Philopator.

1
Whether or not the fact that GUIDs will be randomly distributed is a problem depends on your access patterns. If you're accessing all records randomly, then random distribution may give you better locality :)Michael Mior
The rows are accessed nearly randomly. So, if i understand you correctly, due to mostly random access even sequential auto-inc id would not help much, correct? I guess this is because of caching of hot pages?Philopator
Exactly. This also depends on the frequency of writes. If writes are very frequent, it will still be more efficient to have GUIDs which created in a similar timeframe be in the same partition to reduce seek time. Although if you're using a BBWC and buffering writes in memory, this is less of an issue.Michael Mior

1 Answers

2
votes

The problem with using GUIDs as a PK in InnoDB isn't just the fact that GUID distribution is random. It's that records in InnoDB are stored in primary key order. That means in the table design you're talking about, InnoDB is going to be constantly moving data about in an effort to sort your GUIDs. You should use a translation table that maps the GUIDs to int or bigint and use that as the PK.