I came across this post in Stackoverflow. The first answer mentions something like A clustered index has all the data for the table while a non clustered index only has the column + the location of the clustered index or the row if it is on a heap (a table without a clustered index). How can a non-clustered index have the location of the clustered index? It only contains the column values sorted as nodes in a B-treee with each node pinting to the row where the column has that node-value, right?
2 Answers
Assuming you're talking about SQL Server and also assuming that you have a clustered index on your table (as you should).
Then the nonclustered index has the columns that you define in your CREATE INDEX
statement, plus it has the column(s) that make up your clustered index (if present).
That clustering key value is the "pointer" to where the actual data is located.
If the query executor seeks through your nonclustered index for a value and find a match, then
either that value is all you care about - then you just get back that value
or the nonclustered index might also have some included columns (in the leaf level page) and with those, the query can be satisfied (all columns requested are present) so you get back the values you asked for
or then the values you want are not all in the nonclustered index leaf-level page (that's especially true if you do
SELECT *
all the time) and then the query executor has to take the clustering key value from the nonclustered index, and go back to the clustering index, do what's called a key lookup, seek through the clustering index, and find the associated data page where the complete row is stored -> and now the query executor can return the values you've asked for
For a pretty good explanation - see this blog post here. It says:
In NonClustered Index:
....
2.b. If the table has a Clustered index, or the index is on an Indexed view, the row locator is the Clustered index Key for the row. SQL Server retrieves the data row by searching the Clustered index using the Clustered index Key stored in the Leaf row of the NonClustered index.
Or see this blog post in a whole series on SQL Server indexes which explains the "bookmarks" stored in a nonclustered index leaf-level page, too.
It's pretty easy to imagine like it this:
You have a table of customers, for example customer(id, name, age, adress). On this table you have a clustered index on age. This means your data is sorted by age on the hard drive. This is very beneficial for when you want to do range queries like:
SELECT * FROM customer WHERE age > 18;
Then the data can be fetched from your hard drive with only a few sequential reads. If the index were unclustered you would have to make one disc access (included the seek of the data) for every matching customer tuple.
Maybe for your application you also need to access the users by id. This means without an additional index on id you would have to run over the entire file to find an particular id because it's sorted by age and you have no index! To avoid that, you create a second index on id. Now you can search for an id in this index and the leaf of the index, which contains the customer you are looking for, points to the place in your (by age clustered) data on disc, where you find the tuple. By this you must not read the whole table need much fewer disc accesses (in general 2 for index lookup + 1 for fetching the tuple).
EDIT: I didn't see that you were talking about the same column. One thing I could imagine is that you do one clustered index on one column for the reason described above and another combined index of this and another column for example. This can be useful to do an index-only lookup, here you have all the required attributes in the index and don't need to do a page fetch at all. Another reason would be to have a clustered B+-Index for range queries and a Hash-Index for equality queries. But I think the benefit here would be negligible.
Hope this helped!
a
, and a clustered index onb
, then there will be two separate structures. The clustered index will contain the rows in sorted order according to values inb
, then the non-clustered index will arrange the values ina
in a B-tree with pointers to the data (sorted according to order of values inb'). How can the non-clustered index accommodate another column
b`? – SexyBeastb
(the clustering key) in its leaf-level pages. If a value fora
is found in the nonclustered index, but more data is needed for your query (e.g.SELECT *......
), then using theb
value, a key lookup into the clustered index happens to find the actual data page and return the values for your query – marc_s