4
votes

Here is what I have :

table content : cat_id product_id data1 data2 etc.
the categories are not unique obviously. the product ids are unique.

2 queries : 1 -- SELECT * WHERE cat_id = :cat - must be as quick as possible 2 -- SELECT * WHERE product_id = :prodId In second select, I can add : AND cat_id = :cat

What is the more efficient ?

  • 1 - index (not unique) on cat_id (good for select 1)
  • 2 - primary key on product_id (unique -> excellent for select 2)
  • 3 - index (not unique) on cat_id + PK on product_id (good for 1 & 2 separately)
  • 4 - unique constraint with composite [cat_id+product_id] (good for 1 & 2 together)
  • 5 - same as 4, but defining the composite as PK
  • 6 - composite (4 or 5) + single index/PK

For information, I'll have around 20 products in each category and a lot of categories (say 3000) - And (as it is unique in table) ONE product belongs to only ONE category - In fact, that is not really cats and products, that is for the simplicity of explaination;)

thanks!

2
Just a heads up in option 4: If I recall correctly, MySQL can use the first parts of a composite key. I.e.: if you define your keys as in option 4 and then you build a query with "WHERE cat_id=...", MySQL is able to use the index (not so with "WHERE product_id=..."). dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html - Sebastianb
@Sebastianb - Thanks, that is the kind of things I don't know : how are created -and later used- composite keys by the engine. So, say in phpMyAdmin, the ORDER in wich we set the rows when defining a composite index is important? [cat_id+product_id] != [product_id+cat_id]? I added 5th option. adding the composite as PK is ever better, no? - Rémi Levassor
As long as you have NOT NULL cats and products, you can just PK the composite key and I guess it would be fine (check here for differences between unique indexes and PK's: stackoverflow.com/questions/487314/primary-key-or-unique-index). However, if you want to find a product_id you won't be using the composite index, so I guess you would be better of just declaring product_id as PK and cat_id as index. - Sebastianb

2 Answers

5
votes

A database without Primary key is only half dressed and according to you product_id is an ideal candidate for a primary key, so let choose that. The primary key will be used in

SELECT * WHERE product_id = :prodId

It does not matter if and cat_id = :cat_id becomes a part of the query or not unless you have thousands of cat_ids associated with each product_id.

Then choose an index on cat_id. This will be used on

SELECT * WHERE cat_id = :cat

This will be very quick if the cardinality of the data is good. That means there is a wide distribution of cat_ids in the table. The index of cat_id will not be used in the first query. Thus you have two different indexes and both queries can be expected to be really quick.

Since [cat_id+product_id] != [product_id+cat_id] when it comes to indexing, if you have only one composite index one or the other will be slow.

For example, suppose we had a composite index on (cat_id, product_id) now the following query cannot make use of this index.

 SELECT * FROM tablename WHERE product_id = :prodId

But both these queries can use the (cat_id, product_id) index

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :catId

So in summary. Choose 1 and 2. But if the number of cat_ids are small or there are lots of cat_ids associated with each product_id choose 4 but make sure that the primary key is also in place.

1
votes

If these are your only two queries:

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :cat_id

and you have some other way to assure that product_id is UNIQUE, then you need only:

PRIMARY KEY(cat_id, product_id)

It is optimal for both SELECTs.

It is better than INDEX(cat_id) because (1) secondary keys have to finish their work with a PK lookup, and (2) all the cat rows are adjacent, thereby more efficient.

If product_id is actually an AUTO_INCREMENT, then add

INDEX(product_id)

No, you don't need to say UNIQUE (unless you are prone to deliberately trying to insert duplicate product_ids). The only thing that AI requires is that the id be first in some index so that it can do the equivalent of SELECT max(id) whenever mysqld is restarted.

My recommendations apply regardless of the size of the table.

The order of clauses in WHERE does not matter.

JOINs do not require anything in particular. It is slightly more efficient to JOIN on a PRIMARY KEY than on a secondary key, which is a lot more efficient (but still possible) than on a non-indexed column(s).