24
votes

When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.

More strange details:

  • More typical SELECT statements that don't have the WHERE Id IN( ..) clause perform fine even with 100K+ rows. SELECT * FROM table WHERE Id for example is fast.

  • The performance degradation occurs after prepare()/execute() is complete - it's entirely in PDOStatement::fetch() or PDOStatement::fetchAll(). The MySQL query execution time is tiny in all cases - this isn't a case of a MySQL optimization.

  • Splitting the 10K query into 10 queries with 1K keys is performant.

  • Using mysql, mysqli with prepared statements, or PDO without prepared statements is performant.

  • PDO w/prepared takes ~6 seconds on the example below, while the others take ~0.5s.

  • It gets worse in a non-linear fashion the more keys you have. Try 100K keys.

Sample code:

// $imageIds is an array with 10K keys
$keyCount = count($imageIds);
$keys = implode(', ', array_fill(0, $keyCount, '?'));
$query = "SELECT * FROM images WHERE ImageID IN ({$keys})";
$stmt = $dbh->prepare($query);
$stmt->execute($imageIds);
// until now, it's been fast.  fetch() is the slow part
while ($row = $stmt->fetch()) {
    $rows[] = $row;
}
3
If this is reproducible, then you probably would need to profile PHP to see why the slowdown occurs.Matthew
From your description it sounds like it's a post-processing bug then. I'd suspect the slowdown is due to handling bound parameters. Try ->debugDumpParams() and look for is_param= values. If it's 1 then PDO will iterate over the list to look for bound variables to update. Maybe manually preseeding with ->bindValue() instead of ->execute(ARRAY) helps. But I suspect PDO will always loop over the bound params list. Not sure if is_param= is decisive for that anyway. (And too lazy to comprehend pdo_stmt.c)mario
Note that binding the parameters as string in mysqli with $stmt->bind_param(str_repeat('s', count($imageIds)), ...$imageIds); is not slower than binding them as integers. And both mysqli methods need like 50% more time than an unprepared statement. But the PDO prepared statement is like 50 times slower (with 10K parameters). So it can't be just that PDO is always binding parameters as strings. Even $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, true); doesn't change anything. Something really strange is going on with PDO.Paul Spiegel
Did you ever figure out what was causing this? I'm running into the exact same issue.shreddish
Related bug report: bugs.php.net/bug.php?id=53458 - @mario hits the issue quite well. The fetch time is is proportional to both: number of returned rows and number of bound parameters. For this kind of queries a problem that should be linear turns to O(n²). That means: 100 times more parameters => 10000 times slower.Paul Spiegel

3 Answers

3
votes

Make sure you're telling PDO that the value is an integer not a string; if PDO puts it as a string, then MySQL will have to typecast the values for comparison. Depending on how it goes about this, it could cause major slowdowns by causing MySQL to avoid using an index.

I'm not completely sure about the behaviour here, but I have had this problem with Postgres a few years back...

2
votes

There are some major mistakes on the sample code. So to be more precise.

// $imageIds is an array with 10K keys
$keyCount = count($imageIds);
$keys = implode(', ', array_fill(0, $keyCount, '?'));
$query = "SELECT * FROM images WHERE ImageID IN ({$keys})";

so far the above code will provide something like this...

SELECT * FROM images WHERE ImageID IN (?, ?, ?, ?, ?, ?,...?, ?, ?, ?)

There is no loop for binding... There should be a small loop in which you would bind all of the parameters being passed to MySQL. You go from prepare to execute. When correct binding is primarily what you want.

$stmt = $dbh->prepare($query);
$stmt->execute($imageIds);
// until now, it's been fast.  fetch() is the slow part
while ($row = $stmt->fetch()) {
    $rows[] = $row;
}

Now i have a simple logic question on this part of the question...

When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.

Would it not be better if the same query was re-written so that you would not need to pass 10000 keys as parameters?

PDO and MySQLi do not have major differences in timings. Bad written queries do. Very complex Stored Procedures sometimes might turn out slow if they are not well optimized.

Check if another query could fetch the desired result. For example

Create a small table named test

create table `test` (
  `id` int(10) not null,
  `desc` varchar(255)
  ); 
insert into `test` (`id`,`desc`) values (1,'a'),(10,'a1'),(11,'a2'),(12,'a3'),(13,'a4'),(14,'a5'),(15,'a6'),(2,'ab'),(20,'ab1'),(21,'ab2'),(22,'ab3'),(23,'ab4'),(24,'ab5'),(25,'ab6');

Run those simple queries

select * from `test` where `id` rlike '^1$';
select * from `test` where `id` rlike '^1+';
select * from `test` where `id`=1;
select * from `test` where `id` rlike '^1.$';
select * from `test` where `id` rlike '.2$';
select * from `test` where `id` rlike '^2$';
select * from `test` where `id` rlike '.(2|3)'; // Slower
select * from `test` where `id` IN (12,13,22,23); // Faster
select * from `test` where `id` IN ('12,13,22,23'); // Wrong result
select * from `test` where `id` IN ('12','13','22','23'); // Slower

The last 4 queries have the same result in this example. I think that most of the times if you check it on SQLFiddle you would get query times that correspond to label that they have been given.

0
votes

Don't have any experience with PDO so can't help with that but this method is pretty performant, although it's a bit ugly in places ;)

PHP

<?php

$nums = array(); $max = 10000;

for($i=0;$i<$max*10;$i++) $nums[] = $i;

$conn = new mysqli("127.0.0.1", "vldb_dbo", "pass", "vldb_db", 3306);

$sql = sprintf("call list_products_by_id('%s',0)", implode(",",array_rand($nums, $max)));

$startTime = microtime(true);

$result = $conn->query($sql);

echo sprintf("Fetched %d rows in %s secs<br/>", 
    $conn->affected_rows, number_format(microtime(true) - $startTime, 6, ".", ""));

$result->close();
$conn->close();

?>

Results

select count(*) from product;
count(*)
========
1000000

Fetched 1000 rows in 0.014767 secs
Fetched 1000 rows in 0.014629 secs

Fetched 2000 rows in 0.027938 secs
Fetched 2000 rows in 0.027929 secs

Fetched 5000 rows in 0.068841 secs
Fetched 5000 rows in 0.067844 secs

Fetched 7000 rows in 0.095199 secs
Fetched 7000 rows in 0.095184 secs

Fetched 10000 rows in 0.138205 secs
Fetched 10000 rows in 0.134356 secs

MySQL

drop procedure if exists list_products_by_id;

delimiter #

create procedure list_products_by_id
(
in p_prod_id_csv text,
in p_show_explain tinyint unsigned
)
proc_main:begin

declare v_id varchar(10);
declare v_done tinyint unsigned default 0;
declare v_idx int unsigned default 1;

    create temporary table tmp(prod_id int unsigned not null)engine=memory; 

    -- split the string into tokens and put into a temp table...

    if p_prod_id_csv is not null then
        while not v_done do
            set v_id = trim(substring(p_prod_id_csv, v_idx, 
                if(locate(',', p_prod_id_csv, v_idx) > 0, 
                        locate(',', p_prod_id_csv, v_idx) - v_idx, length(p_prod_id_csv))));

                if length(v_id) > 0 then
                set v_idx = v_idx + length(v_id) + 1;
                        insert ignore into tmp values(v_id);
                else
                set v_done = 1;
                end if;
        end while;
    end if;

    if p_show_explain then

        select count(*) as count_of_tmp from tmp;

        explain
        select p.* from product p
        inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;

    end if;

    select p.* from product p
        inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;

    drop temporary table if exists tmp;

end proc_main #

delimiter ;