2
votes

I was improving some sql query script's performance. For example:

SELECT * 
FROM Book b, Library l, [Order] o 
WHERE o.bookid = b.bookID 
AND o.mode = 'A' 
AND o.library_ID = l.library_ID 
AND l.library_ID > 19 
AND b.publisher_id  > 1000 
AND b.print_id > 800 
AND NOT EXISTS (
  SELECT * 
  FROM ExtBOOK 
  WHERE b.bookid = extbookid 
  AND library_ID = l.library_ID
) 
AND o.activated = 'Y' 
AND b.eisbn13 LIKE '978%' 
AND len(o.ext_user_id) > 3 
AND b.bookid > 200000 
AND b.bookid in (
  SELECT bookid 
  FROM category 
  WHERE categoryid  > 2
) 
ORDER BY o.orderid DESC

When I search this sql script in SQL Management Studio with "Include Actual Execution Plan", the result ask me to add below index

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Order] ([MODE],[ACTIVATED],[LIBRARY_ID],[BOOKID])
INCLUDE ([OrderID],[EXT_USER_ID],[APPROVAL_DATE])

Howevery if I delete some where conditions, like below:

SELECT * FROM Book b, Library l, [Order] o 
WHERE o.bookid = b.bookID 
AND o.mode = 'A' 
AND o.library_ID = l.library_ID 
AND l.library_ID > 19 
ORDER BY o.orderid DESC

I got another different recommendation, like below:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[ORDER] ([MODE])
INCLUDE ([LIBRARY_ID],[BOOKID])

Because the conditions are variable, which index should I create? I understand what indexes do, but not the benefit between a field being indexed and a field being included. Why in the first recommended index, BOOKID and LIBRARY_ID are in the index fields, but in the second recommended index, BOOKID and LIBRARY_ID are in the included fields? What are the differences, and which I should I use to cover all possible conditions?

In addition, from my test, I added each of them to test the performance, but cannot see any difference. I appreciate any help.

1
Index columns are part of the actual index navigation structure, e.g. you can use them in a WHERE clause. Included columns are only present in the leaf level of the index - you cannot use them to select rows - and they're there so your query can be satisfied (all columns requested by the query) just from the index structure so that you don't have to do an expensive key lookup into the actual table data to get that one or two extra columns that you need.marc_s
@danmiao: which table is the alias "m" referring to (m.libary_ID = l.library_ID)? Is that a typo or is there another table in the query?8kb
Hi 8kb, sorry it is my typing error, it is 'o', I have updated, thanksdanmiao
Hi marc_s, thanks very much, I understood your means, just a similar question, I can use included columns to order by, or I have to use index columns to order by, or order by does not need index.danmiao
If you want to use a column in a WHERE or an ORDER BY clause, it must be part of the actual index and stored in the index navigation structure. If you have a column as "included" column, the index cannot be used for ORDER BYmarc_s

1 Answers

1
votes

Your indexing strategy will to some extent depend on the expected volatility of your data - if your data is stable and updated rarely, then you can add more indexes to aid query performance. However, if your data is volatile, and changes frequently, then more indexes will lead to slower performance as the indexes are regenerated when data changes.

It also depends on how predictable your queries are - are they predictable, in which case encapsulate them in stored procedures or parameterised queries, or are they completely ad-hoc?

I'm assuming you already have indexes on Order.BookID and Order.Library_ID?

Additionally, I'd rephrase the queries to use the inner join syntax - ie:

SELECT * 
FROM [Order] o 
     INNER JOIN Library l
          ON o.library_ID = l.library_ID  
     INNER JOIN Book b  
          ON o.bookid = b.bookID 
WHERE 
     o.mode = 'A'  
AND 
     l.library_ID > 19