0
votes

We are a product website with several products having guarantee. Guarantee is only applicable for few products with particular dealerids. The 2 tables are:

Product table with columns as id, name, cityId, dealerId, price. This table has all the products.

GuaranteeDealers table with column as dealerId. This has all dealer with guaranteed products.

We want to get all products with info if it is guaranteed or not. The query looks like:

APPROACH1: Get isGuaranteed from sql function to server(c#) side:

select id, name, cityId, dealerId, price, isGuaranteed = isGuaranteed( dealerId) from customers

isGuaranteed is a sql function that checks if dealerId is in the table guranteeDealers. If yes it returns 1 else 0.

I have 50000 products and 500 such dealers and this query takes too long to execute.

OR

APPROACH2: Get list of dealers and set isGuaranteed flag in c#(server) side.

select id, name, cityId, dealerId, price. Map these to c# list of products

select dealerId from guaranteeDealers table to c# list of dealers.

Iterate product records in c# and set the isGuaranteed flag by c# function that checks if product's dealerId is in c# list of guaranteeDealers.

This takes very less time compared to 1.

While both approaches look similar to me, can someone explain why it takes so long time to execute function in select statement in mysql? Also which is correct to do, approach 1 or 2?

3
This disparity in performance is often a sign of an issue with your indeces. I recommend ensuring you have an index on important keys in your database, and reindexing the existing ones.CDove
It is indexed and primary key.Sahil Sharma
Why it takes so much time? It's the cost (time) for repeated execution of correlated subquery, and overhead (repeated context switches) due to hiding the subquery in a MySQL stored program. That effectively disables MySQL ability to produce the set efficiently. The design is constraining MySQL to perform specific operations row by excruciating row, rather than allowing MySQL perform the operation on a set. (See my answer.)spencer7593

3 Answers

2
votes

Q: "why it takes so long time to execute function in select statement in mysql?"

In terms of performance, executing a correlated subquery 50,000 times will eat our lunch, and if we're not careful, it will eat our lunchbox too.

That subquery will be executed for each and every row returned by the outer query. That's like executing 50,000 separate, individual SELECT statements. And that's going to take time.

Hiding a correlated subquery inside a MySQL stored program (function) doesn't help. That just adds overhead on each execution of the subquery, and makes things slower. If we strip out the function and bring that subquery inline, we are probably looking at something like this:

 SELECT p.id
      , p.name
      , p.cityId
      , p.dealerId
      , p.price
      , IFNULL( ( SELECT 1
                    FROM guaranteeDealers d
                   WHERE d.dealerId = p.dealerID
                   LIMIT 1
                )
        ,0) AS isGuarantee
   FROM products p
  ORDER BY ...

For each and every row returned from products (that isn't filtered out by a predicate e.g. condition in the WHERE clause), this is essentially telling MySQL to execute a separate SELECT statement. Run a query to look to see if the dealerID is found in the guaranteeDealers table. And that happens for each row.

If the outer query is only returning a couple of rows, then that's only a couple of extra SELECT statements to execute, and we aren't really going to notice the extra time. But when we return tens (or hundreds) of thousands of rows, that starts to add up. And it gets expensive, in terms of the total amount of time all those query executions take.

And if we "hide" that subquery in a MySQL stored program (function), that adds more overhead, introducing a bunch of context switches. From query executing in the database context, calling a function that switches over to the stored program engine which executes the function, which then needs to run a database query, which switches back to the database context to execute the query and return a resultset, switching back to the stored program environment to process the resultset and return a value, and then switching back to the original database context, to get the returned value. If we have to do that a couple of times, no big whoop. Repeat that tens of thousands of times, and that overhead is going to add up.

(Note that native MySQL built-in functions don't have this same context switching overhead. The native functions are compiled code that execute within the database context. Which is a big reason we favor native functions over MySQL stored programs.)


If we want improved performance, we need to ditch the processing RBAR (row by agonizing row), which gets excruciatingly slow for large sets. We need to approach the problem set-wise rather than row-wise.

We can tell MySQL what set to return, and let it figure out the most efficient way to return that. Rather than us round tripping back and forth to the database, executing individual SQL statements to get little bits of the set piecemeal, using instructions that dictate how MySQL should prepare the set.


In answer to the question

Q: "which approach is correct"

both approaches are "correct" is as much as they return the set we're after.

The second approach is "better" in that it significantly reduces the number of SELECT statements that need to be executed (2 statements rather than 50,001 statements).

In terms of the best approach, we are usually better off letting MySQL do the "matching" of rows, rather than doing the matching in client code. (Why unnecessarily clutter up our code doing an operation that can usually be much more efficiently accomplished in the database.) Yes, sometimes we need to do the matching in our code. And occasionally it turns out to be faster.

But sometimes, we can write just one SELECT statement that specifies the set we want returned, and let MySQL have a go at it. And if it's slow, we can do some tuning, looking at the execution plan, making sure suitable indexes are available, and tweaking the query.

Given the information in the question about the set to be returned, and assuming that dealerId is unique in the guaranteeDealers table. If our "test" is whether a matching row exists in the guaranteeDealers table, we can use an OUTER JOIN operation, and an expression in the SELECT list that returns a 0 or 1, depending on whether a matching row was found.

 SELECT p.id
      , p.name
      , p.cityId
      , p.dealerId
      , p.price
      , IF(d.dealerId IS NULL,0,1) AS isGuarantee
   FROM products p
   LEFT
   JOIN guaranteeDealers d
     ON d.dealerId = p.dealerId
  ORDER BY ...

For optimal performance, we are going to want to have suitable indexes defined. At a mimimum (if there isn't already such an index defined)

 ON guaranteeDealers (dealerId)

If there are also other tables that are involved in producing the result we are after, then we want to also involve that table in the query we execute. That will give the MySQL optimizer a chance to come up with the most efficient plan to return the entire set. And not constrain MySQL to performing individual operations to be return bits piecemeal.

0
votes
select id, name, cityId, customers.dealerId, price, 
isGuaranteed = guaranteeDealers.dealerId is not null 
from customers left join guaranteeDealers 
on guaranteeDealers.dealerId = customets.dealerId

No need to call a function.

Note I have used customers because that is the table you used in your question - although I suspect you might have meant products.

0
votes

Approach 1 is the better one because it reduces the size of the resultset being transferred from the database server to the application server. Its performance problem is caused by the isGuaranteed function, which is being executed once per row (of the customers table, which looks like it might be a typo). An approach like this would be much more performant:

select p.id, p.name, p.cityId, p.dealerId, p.price, gd.IsGuaranteed is not null
from Product p
left join GuaranteeDealers gd on p.dealerId = gd.dealerId