Q: "why it takes so long time to execute function in select statement in mysql?"
In terms of performance, executing a correlated subquery 50,000 times will eat our lunch, and if we're not careful, it will eat our lunchbox too.
That subquery will be executed for each and every row returned by the outer query. That's like executing 50,000 separate, individual SELECT statements. And that's going to take time.
Hiding a correlated subquery inside a MySQL stored program (function) doesn't help. That just adds overhead on each execution of the subquery, and makes things slower. If we strip out the function and bring that subquery inline, we are probably looking at something like this:
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IFNULL( ( SELECT 1
FROM guaranteeDealers d
WHERE d.dealerId = p.dealerID
LIMIT 1
)
,0) AS isGuarantee
FROM products p
ORDER BY ...
For each and every row returned from products
(that isn't filtered out by a predicate e.g. condition in the WHERE clause), this is essentially telling MySQL to execute a separate SELECT statement. Run a query to look to see if the dealerID
is found in the guaranteeDealers
table. And that happens for each row.
If the outer query is only returning a couple of rows, then that's only a couple of extra SELECT statements to execute, and we aren't really going to notice the extra time. But when we return tens (or hundreds) of thousands of rows, that starts to add up. And it gets expensive, in terms of the total amount of time all those query executions take.
And if we "hide" that subquery in a MySQL stored program (function), that adds more overhead, introducing a bunch of context switches. From query executing in the database context, calling a function that switches over to the stored program engine which executes the function, which then needs to run a database query, which switches back to the database context to execute the query and return a resultset, switching back to the stored program environment to process the resultset and return a value, and then switching back to the original database context, to get the returned value. If we have to do that a couple of times, no big whoop. Repeat that tens of thousands of times, and that overhead is going to add up.
(Note that native MySQL built-in functions don't have this same context switching overhead. The native functions are compiled code that execute within the database context. Which is a big reason we favor native functions over MySQL stored programs.)
If we want improved performance, we need to ditch the processing RBAR (row by agonizing row), which gets excruciatingly slow for large sets. We need to approach the problem set-wise rather than row-wise.
We can tell MySQL what set to return, and let it figure out the most efficient way to return that. Rather than us round tripping back and forth to the database, executing individual SQL statements to get little bits of the set piecemeal, using instructions that dictate how MySQL should prepare the set.
In answer to the question
Q: "which approach is correct"
both approaches are "correct" is as much as they return the set we're after.
The second approach is "better" in that it significantly reduces the number of SELECT statements that need to be executed (2 statements rather than 50,001 statements).
In terms of the best approach, we are usually better off letting MySQL do the "matching" of rows, rather than doing the matching in client code. (Why unnecessarily clutter up our code doing an operation that can usually be much more efficiently accomplished in the database.) Yes, sometimes we need to do the matching in our code. And occasionally it turns out to be faster.
But sometimes, we can write just one SELECT statement that specifies the set we want returned, and let MySQL have a go at it. And if it's slow, we can do some tuning, looking at the execution plan, making sure suitable indexes are available, and tweaking the query.
Given the information in the question about the set to be returned, and assuming that dealerId
is unique in the guaranteeDealers
table. If our "test" is whether a matching row exists in the guaranteeDealers
table, we can use an OUTER JOIN operation, and an expression in the SELECT list that returns a 0 or 1, depending on whether a matching row was found.
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IF(d.dealerId IS NULL,0,1) AS isGuarantee
FROM products p
LEFT
JOIN guaranteeDealers d
ON d.dealerId = p.dealerId
ORDER BY ...
For optimal performance, we are going to want to have suitable indexes defined. At a mimimum (if there isn't already such an index defined)
ON guaranteeDealers (dealerId)
If there are also other tables that are involved in producing the result we are after, then we want to also involve that table in the query we execute. That will give the MySQL optimizer a chance to come up with the most efficient plan to return the entire set. And not constrain MySQL to performing individual operations to be return bits piecemeal.