8
votes

I'm currently using GUIDs as a NONCLUSTERED PRIMARY KEY alongside an INT IDENTITY column.

The GUIDs are required to allow offline creation of data and synchronisation - which is how the entire database is populated.

I'm aware of the implications of using a GUID as a clustered primary key, hence the integer clustered index but does using a GUID as a primary key and therefore foreign keys on other tables have significant performance implications?

Would a better option to use an integer primary/foreign key, and use the GUID as a client ID which has a UNIQUE INDEX on each table? - My concern there is that entity framework would require loading the navigation properties in order to get the GUID of the related entity without significant alteration to the existing code.

The database/hardware in question is SQL Azure.

3

3 Answers

1
votes

Generally speaking, it is preferable to use INT for Primary Key / Foreign Key fields, whether or not these fields are the leading field in Clustered indexes. The issue has to do with JOIN performance and even if you use UNIQUEINDENTIFIER as NonClustered or even if you used NEWSEQUENTIALID() to reduce fragmentation, as the tables get larger it will be more scalable to JOIN between INT fields. (Please note that I am not saying that PK / FK fields should always be INT as sometimes there are perfectly valid natural keys to use).

In your case, given the concern about Entity Framework and generating the GUIDs in the app and not in the DB, go with your alternate suggestion of using INT as the PK / FK fields, but rather than have the UNIQUEIDENTIFIER in all tables, only put it in the main user / customer info table. I would think that you should be able to do a one-time lookup of the customer INT identifier based on the GUID, cache that value, and then use the INT value for all remaining operations. And yes, be sure there is a UNIQUE, NONCLUSTERED index on the GUID field.

That all being said, if your tables will never (and I mean NEVER as opposed to just not in the first 2 years) grow beyond maybe 100,000 rows each, then using UNIQUEIDENTIFIER is less of a concern as small volumes of rows generally perform ok (given moderately decent hardware that is not overburdened with other processes or low on memory). Obviously, the point at which JOIN performance degrades due to using UNIQUEIDENTIFIER will greatly depend on the specifics of the system: hardware as well as what types of queries, how the queries are written, and how much load on the system.

4
votes

You can also create foreign keys against unique key constraints, which then gives you the option to foreign key to the ID identity as an alternative to the Guid.

i.e.

Create Table SomeTable
(
    UUID UNIQUEIDENTIFIER NOT NULL,
    ID INT IDENTITY(1,1) NOT NULL,

    CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UUID),
    CONSTRAINT UQ UNIQUE (ID)
)
GO

Create Table AnotherTable
(
    SomeTableID INT,

    FOREIGN KEY (SomeTableID) REFERENCES SomeTable(ID)
)
GO

Edit

Assuming that your centralized database is a Mart, and that only batch ETL is done from the source databases, if you do your ETL directly to the central database (i.e. not via Entity Framework), given that all your tables have UUID FK's after re-population from the distributed databases, you'll need to either map the INT UKCs during ETL or fix them up after the import (which would require a temporary NOCHECK constraint step on the INT FK's).

Once ETL is loaded and INT keys are mapped, I would suggest you ignore / remove the UUID's from your ORM model - you would need to regenerate your EF navigation on the INT keys.

A different solution would be required if you update the central database directly or do continual ETL and do use EF for the ETL itself. In this case, it might be less total I/O just to leave the PK GUID as FKs for RI, drop the INT FK's altogether, and choose other suitable columns for clustering (minimizing page reads).

2
votes

GUID have important implications, yes. Your index is nonclustered, but the index itself will be quickly fragmented, and indexes on the foreign keys will be too. The size is also a concern : 16 Bytes instead of a 4 Bytes integer.

You can use the NEWSEQUENTIALID() function as the default value for your column to make it less random and diminish fragmentation.

But yes, I'd say that using an integer as your primary key and for references will be the best solution.