What is the difference between the EXISTS
and IN
clause in SQL?
When should we use EXISTS
, and when should we use IN
?
The exists
keyword can be used in that way, but really it's intended as a way to avoid counting:
--this statement needs to check the entire table
select count(*) from [table] where ...
--this statement is true as soon as one match is found
exists ( select * from [table] where ... )
This is most useful where you have if
conditional statements, as exists
can be a lot quicker than count
.
The in
is best used where you have a static list to pass:
select * from [table]
where [field] in (1, 2, 3)
When you have a table in an in
statement it makes more sense to use a join
, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in
queries will always get a nested join plan, while join
queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in
is used.
EXISTS
will tell you whether a query returned any results. e.g.:
SELECT *
FROM Orders o
WHERE EXISTS (
SELECT *
FROM Products p
WHERE p.ProductNumber = o.ProductNumber)
IN
is used to compare one value to several, and can use literal values, like this:
SELECT *
FROM Orders
WHERE ProductNumber IN (1, 10, 100)
You can also use query results with the IN
clause, like this:
SELECT *
FROM Orders
WHERE ProductNumber IN (
SELECT ProductNumber
FROM Products
WHERE ProductInventoryQuantity > 0)
I'm assuming you know what they do, and thus are used differently, so I'm going to understand your question as: When would it be a good idea to rewrite the SQL to use IN instead of EXISTS, or vice versa.
Is that a fair assumption?
Edit: The reason I'm asking is that in many cases you can rewrite an SQL based on IN to use an EXISTS instead, and vice versa, and for some database engines, the query optimizer will treat the two differently.
For instance:
SELECT *
FROM Customers
WHERE EXISTS (
SELECT *
FROM Orders
WHERE Orders.CustomerID = Customers.ID
)
can be rewritten to:
SELECT *
FROM Customers
WHERE ID IN (
SELECT CustomerID
FROM Orders
)
or with a join:
SELECT Customers.*
FROM Customers
INNER JOIN Orders ON Customers.ID = Orders.CustomerID
So my question still stands, is the original poster wondering about what IN and EXISTS does, and thus how to use it, or does he ask wether rewriting an SQL using IN to use EXISTS instead, or vice versa, will be a good idea?
EXISTS
is much faster than IN
when the subquery results is very large.IN
is faster than EXISTS
when the subquery results is very small.
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO
INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5
INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'
Query 1
SELECT
FROM t1
WHERE not EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
Query 2
SELECT t1.*
FROM t1
WHERE t1.id not in (SELECT t2.t1id FROM t2 )
If in t1
your id has null value then Query 1 will find them, but Query 2 cant find null parameters.
I mean IN
can't compare anything with null, so it has no result for null, but EXISTS
can compare everything with null.
IN supports only equality relations (or inequality when preceded by NOT).
It is a synonym to =any / =some, e.g
select *
from t1
where x in (select x from t2)
;
EXISTS supports variant types of relations, that cannot be expressed using IN, e.g. -
select *
from t1
where exists (select null
from t2
where t2.x=t1.x
and t2.y>t1.y
and t2.z like '℅' || t1.z || '℅'
)
;
The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.
The tables' definition, statistics' accuracy, database configuration and optimizer's version have all impact on the execution plan and therefore on the performance metrics.
I think,
EXISTS
is when you need to match the results of query with another subquery.
Query#1 results need to be retrieved where SubQuery results match. Kind of a Join..
E.g. select customers table#1 who have placed orders table#2 too
IN is to retrieve if the value of a specific column lies IN
a list (1,2,3,4,5)
E.g. Select customers who lie in the following zipcodes i.e. zip_code values lies in (....) list.
When to use one over the other... when you feel it reads appropriately (Communicates intent better).
Which one is faster depends on the number of queries fetched by the inner query:
EXIST evaluate on true or false but IN compare multiple value. When you don't know the record is exist or not, your should choose EXIST
The reason is that the EXISTS operator works based on the “at least found” principle. It returns true and stops scanning table once at least one matching row found.
On the other hands, when the IN operator is combined with a subquery, MySQL must process the subquery first, and then uses the result of the subquery to process the whole query.
The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides a better performance.
However, the query that uses the IN operator will perform faster if the result set returned from the subquery is very small.
My understand is both should be the same as long as we are not dealing with NULL values.
The same reason why the query does not return the value for = NULL vs is NULL. http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/
As for as boolean vs comparator argument goes, to generate a boolean both values needs to be compared and that is how any if condition works.So i fail to understand how IN and EXISTS behave differently .
In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.
https://docs.oracle.com/cd/B19306_01/server.102/b14211/sql_1016.htm#i28403
If a subquery returns more than one value, you might need to execute the outer query- if the values within the column specified in the condition match any value in the result set of the subquery. To perform this task, you need to use the in
keyword.
You can use a subquery to check if a set of records exists. For this, you need to use the exists
clause with a subquery. The exists
keyword always return true or false value.
I believe this has a straightforward answer. Why don't you check it from the people who developed that function in their systems?
If you are a MS SQL developer, here is the answer directly from Microsoft.
IN
:
Determines whether a specified value matches any value in a subquery or a list.
Specifies a subquery to test for the existence of rows.
I found that using EXISTS keyword is often really slow (that is very true in Microsoft Access). I instead use the join operator in this manner : should-i-use-the-keyword-exists-in-sql