Optimize a rank query

Question

I'm using this query to fetch user's rank from Sql Server DB:

select user_rank
from (select t.user_id, rank() over (order by score desc) as user_rank
  from user_stats t
 ) t
where t.user_id='some_user_id';

The table consists approximately 22,000 rows and the query takes 3.5 seconds, which is too slow.

There are several indexes on this table, these are the relevant:

user_id - Unique, NONCLUSTERED INDEX

score - Non-unique, NONCLUSTERED INDEX

If I change the query and use id (which is my primary key) instead of user_id, then the query executes fast:

select user_rank
from (select t.id, rank() over (order by score desc) as user_rank
  from user_stats t
 ) t
where t.id='some_id';

After checking the Execution plan I see that Clustered index scan cost is 92%, though I don't really understand why it's needed in this case.

What can be done to optimize this query?

The statistics of the query:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected) Table 'users_stats'. Scan count 1, logical reads 22529, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 78 ms, elapsed time = 3576 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

You are going to have a scan here no matter what because your subquery has to look at every single row of the table. Not sure why it would 3.5 seconds but without the table and index definitions it is only guess work. — Sean Lange
If UserID is unique then what is the purpose of ranking one user? — Giorgi Nakeuri
But why is there such a difference between using user_id and id as they both indexed? — Udi Idan
@GiorgiNakeuri - I'm using this query in a stored procedure which returns the ranking of a specific user. — Udi Idan
@UdiI, is UserID unique in user_stats table? It will return you 1 row won't it? — Giorgi Nakeuri

Nathan Nathan · Accepted Answer · 2015-02-26T23:11:00

I suspect that the main reason for your performance degradation is because your index is not covering. You would likely see a substantial improvement in performance by making the index covering. Here is a good article that discusses covering indexes.

In a nutshell, the index offers simply a pointer to the row. In order to get the data for the "score" column, which is needed to rank the result set, the engine must scan the clustered index in order to look-up the data. If you include the value in the index, the engine will be able to perform the operation without needing the clustered index scan.

The index should be rewritten as follows:

CREATE UNIQUE NONCLUSTERED INDEX UQ_USER_STATS_USER_ID ON user_stats (user_id) INCLUDE (score);

Optimize a rank query

2 Answers