SQL Guid Primary Key Join Performance

Question

I'm currently using GUIDs as a NONCLUSTERED PRIMARY KEY alongside an INT IDENTITY column.

The GUIDs are required to allow offline creation of data and synchronisation - which is how the entire database is populated.

I'm aware of the implications of using a GUID as a clustered primary key, hence the integer clustered index but does using a GUID as a primary key and therefore foreign keys on other tables have significant performance implications?

Would a better option to use an integer primary/foreign key, and use the GUID as a client ID which has a UNIQUE INDEX on each table? - My concern there is that entity framework would require loading the navigation properties in order to get the GUID of the related entity without significant alteration to the existing code.

The database/hardware in question is SQL Azure.

Solomon Rutzky Solomon Rutzky · Accepted Answer · 2013-11-29T15:19:19

Generally speaking, it is preferable to use INT for Primary Key / Foreign Key fields, whether or not these fields are the leading field in Clustered indexes. The issue has to do with JOIN performance and even if you use UNIQUEINDENTIFIER as NonClustered or even if you used NEWSEQUENTIALID() to reduce fragmentation, as the tables get larger it will be more scalable to JOIN between INT fields. (Please note that I am not saying that PK / FK fields should always be INT as sometimes there are perfectly valid natural keys to use).

In your case, given the concern about Entity Framework and generating the GUIDs in the app and not in the DB, go with your alternate suggestion of using INT as the PK / FK fields, but rather than have the UNIQUEIDENTIFIER in all tables, only put it in the main user / customer info table. I would think that you should be able to do a one-time lookup of the customer INT identifier based on the GUID, cache that value, and then use the INT value for all remaining operations. And yes, be sure there is a UNIQUE, NONCLUSTERED index on the GUID field.

That all being said, if your tables will never (and I mean NEVER as opposed to just not in the first 2 years) grow beyond maybe 100,000 rows each, then using UNIQUEIDENTIFIER is less of a concern as small volumes of rows generally perform ok (given moderately decent hardware that is not overburdened with other processes or low on memory). Obviously, the point at which JOIN performance degrades due to using UNIQUEIDENTIFIER will greatly depend on the specifics of the system: hardware as well as what types of queries, how the queries are written, and how much load on the system.

SQL Guid Primary Key Join Performance

3 Answers