I have been reading lately about how clustered index
and non-clustered index
works. My understanding in simple terms (correct me if wrong):
The data structure that backs clustered
and non-clustered index
is B-Tree
Clustered Index
: physically sorts the data based on the index column (or key). you can have only one clustered Index
per table
. If no index
is specified during table creation, SQL
server will automatically create a clustered Index
on the primary key column
.
Q1: Since data is physically sorted based on index, there is no extra space needed here. is this correct? so what happens when I drop the index I created?
Non-clustered Index
: In non-clustered indexes
, the leaf-node
of the tree contains the columns values and a pointer (row locator) to the actual row in the database. Here there is extra space needed for storing this non-clustered index table
physically on disk. However, one is not limited by the number of non-clustered Indexes.
Q2: Does it mean a query on non-clustered index column will not result in the sorted data?
Q3: There is an extra look-up associated here to locate the actual row data using the pointer at the leaf node. How much performance difference would this be when compared to a clustered index?
Excercise:
consider an Employee table:
CREATE TABLE Employee
(
PersonID int PRIMARY KEY,
Name varchar(255),
age int,
salary int
);
Now I created an employee table ( a default clustered index on employee is created).
Two frequent queries on this table happen only on age and salary columns. For sake of simplicity, lets assume that the table is NOT frequently updated
for example:
select * from employee where age > XXX;
select * from employee where salary > XXXX and salary < YYYY;
Q4 : what is the best way to construct indexes, so that queries on both these column have similar performance. If I have clustered index on age queries on age column will be faster but than on salary column will be slower.
Q5: On a related note, I have repeatedly seen that indexes (both clustered and non-clustered) should be created on column with unique constraints. why is that? what will happen on failure to do this?
Thank you very much Posts I read are here:
http://msdn.microsoft.com/en-us/library/ms190457.aspx
What do Clustered and Non clustered index actually mean?
What are the differences between a clustered and a non-clustered index?