How to quickly SELECT 3 random records from a 30k MySQL table with a where filter by a single query?

Question

Well, this is a very old question never gotten real solution. We want 3 random rows from a table with about 30k records. The table is not so big in point of view MySQL, but if it represents products of a store, it's representative. The random selection is useful when one presents 3 random products in a webpage for example. We would like a single SQL string solution that meets these conditions:

In PHP, the recordset by PDO or MySQLi must have exactly 3 rows.
They have to be obtained by a single MySQL query without Stored Procedure used.
The solution must be quick as for example a busy apache2 server, MySQL query is in many situations the bottleneck. So it has to avoid temporary table creation, etc.
The 3 records must be not contiguous, ie, they must not to be at the vicinity one to another.

The table has the following fields:

CREATE TABLE Products (
  ID INT(8) NOT NULL AUTO_INCREMENT,
  Name VARCHAR(255) default NULL,
  HasImages INT default 0,
  ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The WHERE constraint is Products.HasImages=1 permitting to fetch only records that have images available to show on the webpage. About one-third of records meet the condition of HasImages=1.

Searching for a Perfection, we first let aside the existent Solutions that have drawbacks:

I. This basic solution using ORDER BY RAND(),

is too slow but guarantees 3 really random records at each query:

SELECT ID, Name FROM Products WHERE HasImages=1 ORDER BY RAND() LIMIT 3;

*CPU about 0.10s, scanning 9690 rows because of WHERE clause, Using where; Using temporary; Using filesort, on Debian Squeeze Double-Core Linux box, not so bad but

not so scalable to a bigger table as temporary table and filesort are used, and takes me 8.52s for the first query on the test Windows7::MySQL system. With such a poor performance, to avoid for a webpage isn't-it ?

II. The bright solution of riedsio using JOIN ... RAND(),

from MySQL select 10 random rows from 600K rows fast, adapted here is only valid for a single random record, as the following query results in an almost always contiguous records. In effect it gets only a random set of 3 continuous records in IDs:

SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT (RAND() * (SELECT MAX(ID) FROM Products)) AS ID)
  AS t ON Products.ID >= t.ID
WHERE (Products.HasImages=1)
ORDER BY Products.ID ASC
LIMIT 3;

*CPU about 0.01 - 0.19s, scanning 3200, 9690, 12000 rows or so randomly, but mostly 9690 records, Using where.

III. The best solution seems the following with WHERE ... RAND(),

seen on MySQL select 10 random rows from 600K rows fast proposed by bernardo-siu:

SELECT Products.ID, Products.Name FROM Products
WHERE ((Products.Hasimages=1) AND RAND() < 16 * 3/30000) LIMIT 3;

*CPU about 0.01 - 0.03s, scanning 9690 rows, Using where.

Here 3 is the number of wished rows, 30000 is the RecordCount of the table Products, 16 is the experimental coefficient to enlarge the selection in order to warrant the 3 records selection. I don't know on what basis the factor 16 is an acceptable approximation.

We so get at the majority of cases 3 random records and it's very quick, but it's not warranted: sometimes the query returns only 2 rows, sometimes even no record at all.

The three above methods scan all records of the table meeting WHERE clause, here 9690 rows.

A better SQL String?

Not really the solution you are looking for, but to increase performance, you could have a cronjob choose 3 random products every 5 minutes and cache those in a file, then include that file in your pages. — Sumurai8
Oh I see. If it takes 8 seconds, that means there probably no index. A full table scan should not take that long for that row count. — Anthony
maybe faster if you get 3 random ID's in php and then fetch rows with this ID's ? — joschua011
I tend to agree with @Sumurai8. Rather than whacking your products list table on every new request, then complain about performance, why don't you pre-load your random choices into a file or a secondary table. You could pre-fetch 20 or 100 combinations in advance (updated on a hourly/daily basis perhaps), and then randomly select from these options instead. — cartbeforehorse
Test results for Solution III: I am not retrieving random results. i'm having the first 10000 rows set to HasImages=0 (id 1 to 10000), and the proceding 20000 rows to HasImages=1 (id 10001 to 30000). I'm never receiving results for id's above 15000. — Tobias Strebitzer

Leo Nyx Leo Nyx · Accepted Answer · 2013-09-22T15:52:32

Ugly, but quick and random. Can become very ugly very fast, especially with tuning described below, so make sure you really want it this way.

(SELECT Products.ID, Products.Name
FROM Products
    INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)

UNION ALL

(SELECT Products.ID, Products.Name
FROM Products
    INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)

UNION ALL

(SELECT Products.ID, Products.Name
FROM Products
    INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)

First row appears more often than it should

If you have big gaps between IDs in your table, rows right after such gaps will have bigger chance to be fetched by this query. In some cases, they will appear significatnly more often than they should. This can not be solved in general, but there's a fix for a common particular case: when there's a gap between 0 and the first existing ID in a table.

Instead of subquery (SELECT RAND()*<max_id> AS ID) use something like (SELECT <min_id> + RAND()*(<max_id> - <min_id>) AS ID)

Remove duplicates

The query, if used as is, may return duplicate rows. It is possible to avoid that by using UNION instead of UNION ALL. This way duplicates will be merged, but the query no longer guarantees to return exactly 3 rows. You can work around that too, by fetching more rows than you need and limiting the outer result like this:

(SELECT ... LIMIT 1)
UNION (SELECT ... LIMIT 1)
UNION (SELECT ... LIMIT 1)
...
UNION (SELECT ... LIMIT 1)
LIMIT 3

There's still no guarantee that 3 rows will be fetched, though. It just makes it more likely.