Primary key question

votes

Is there a benefit to having a single column primary key vs a composite primary key?

I have a table that consists of two id columns which together make up the primary key.

Are there any disadvantages to this? Is there a compelling reason for me to throw in a third column that would be unique on it's own?

databasedatabase-design

See stackoverflow.com/questions/404040/… or stackoverflow.com/questions/1383062/… or quite a few others – John M Gant

The single vs. composite & identity vs. meaningful value issues have been discussed at length on SO. From what I've seen there's no single consensus, and little evidence to conclusively support either set of approaches for all cases. Do what makes sense in your particular case. – John M Gant

possible duplicate of Why single primary key is better than composite keys? – Gert Grenander

12 Answers

votes

Database Normalization nuts will tell you one thing.

I'm just going to offer my own opinion of what i've learned over the years..I stick an AutoIncrementing ID field to Every ($&(@#$)# one of my tables. It makes life a million times easier in the long run to be able to single out with impunity a single row.

This is from a "down in the trenches" developer.

votes

Single column keys are simple to write, simple to maintain, and simple to understand.

If you're going to have a huge number of rows - billions? - maybe saving a byte here and there will help.

But if you're not looking at extreme cases, optimizing for "simple" is often the best way to go.

votes

If you are a coder and the database is nothing to you but a glorified object-store, then sure, by all means inject surrogate keys willy nilly. In fact go one better and just delegate all DB schema design and DB interaction to your favourite ORM and be done with it. Indeed, when I want a small or medium scale object-store, that's exactly what I do.

If you are approaching an information systems or information management problem, then it is a completely different story. When you start dealing with 10's (or more likely 100's) of millions of dirty records integrated from multiple sources, several or all of which are not under your control; at that point the seductive lure of an easy answer to the problems of 'identity' is a trap.

Yes you sometimes still introduce a surrogate key internally to allow for concise FK relationships and improved cache efficiency on covering indices; but, you gain those benefits at the cost of substantial pain at managing the natural-key/surrogate-key relationship.

In this case it will be important to make sure you don't allow the surrogate key to leak. Your public API's at the business-logic layer should use the natural-key, nothing above an document/record-cache should be aware of the existence of a surrogate key. Be aware that the cost of matching updates against the existing surrogate keys can be prohibitive, and a far larger scalability hit than the incremental cost of moving a few extra bytes per request over the internal network.

So in conclusion:

If the DB is just being used as an object-store: let the ORM worry about object identity, and there should almost certainly be a surrogate key.
If the DB is being used as a database: the introduction of a surrogate key is an engineering design decision with serious tradeoffs in both directions. The decision will need to be made on a case by case basis, with full recognition of the resulting costs to be accepted in exchange for the benefits gained either way.

Update

The 'convenience' of a surrogate key is really just the ability to punt on the question of identity. This is often necessary in a database, and reasonable in the caching layer as I allow, but beyond that it leads to brittle data designs. The problem is that identity is no something that has one correct answer. For non-trivial data-intensive systems you will routinely find yourself needing to work in terms of equivalence classes, rather than the reference identity, object-oriented programming lulls us into thinking is normal.

What it really comes down to is a realization that the whole concept of a 'primary key' is a fiction invented to help the relational model work efficiently; but, adopting a surrogate key, cements that fiction and makes the whole system brittle and inflexible. Business logic needs to be able to provide their own definitions of equality — sometimes four copies of the same file need to be considered four files, sometimes they should be considered indistinguishable from the original file; when you edit one of them, is that then a new file? the same file? The answer to both questions is of course yes, when... Working with natural keys provides this critical ability to work in terms of conceptual equivalence classes. If you let surrogate keys infect your business logic, you quickly lose this.

votes

I have had to use multi-column primary keys in the past, and it became quite a nightmare very quickly.

If you have one table that references your first table, how does it contain that primary key? Now add another table that references only the second table but needs to find data in the first. Now another... on down the rabbit hole.

If you know that you will only have the one table, there's probably not an issue either way- use whichever represents your data better. But if you'll be using it in joins, you can lose performance pretty quickly.

votes

Is there a benefit to having a single column primary key vs a composit[sic] primary key?

Yes. If the primary key also happens to be the clustered index, it is common that the clustered index is duplicated fully for each secondary index in the table. Therefore, having a fatter clustered index, which is what one would get with a composite, implies an increase in storage cost. Also, foreign references to this table would need to specify both fields to refer to a unique entry, which implies a further storage cost. There is also an arguably greater cost in development time because there is a slight increase in the complexity of the join.

On the other hand, depending on the distribution of the values of your two key fields, it may be the case that concurrent access to your table is greatly improved because chronologically-successive inserts could occur on different physical pages; this could be the case, for example, if your fields are time-independent (and non-monotonic like an auto-incrementer) like clientID, or something like that. This could be significant for performance in a high concurrency environment.

I have a table that consists of two id columns which together make up the primary key.

Are there any disadvantages to this? Is there a compelling reason for me to throw in a third column that would be unique on it's own?

If the most common way in which your table is queried is to specify those three fields as restrictions, then having all three in a composite key would likely be the fastest lookup.

And there is another important point that I almost forgot. Since having a composite key means that foreign references to this table from other tables must specify all fields in the key, it also means that some queries performed on the other table that required a restriction on one or more of the parts of the composite index of this table, can be performed without requiring a join. This could be considered similar to the concept of denormalization for the sake of performance (and arguably sacrificing a little ease of maintainability).

votes

In general I prefer to have a surrogate key becasue there are very few truly good natural keys (key problem is not uniqueness but that they change over time) and the longer the natural key, the more it affects performance when used as a PK. If you have a natural key, you should create a unique index on it and then use the surrogate key as the PK used for joining to other tables. That enforces the uniqueness of the natural key data but fixes the problems of join performance and the extra time to update all child records when the natural key changes.

There is one case where I ignore this and that is a joining table. If it is a table that is used to enforce a many to many relationship and consists only of two surrogate keys from other tables, then you really gain nothing from adding a surrogate key. Typically the individual keys are used for joins not the PK and surrogate keys almost never change. In a joining table, I just add the two colmns I need and nothing else.

votes

In most databases I know (MySQL, PostgreSQL) the composite key will generate an index. So if you specify your key as composite the DB should provide you an efficient way to lookup tuples from the DB using that key. I think it is the case for all DBs. I think you do not have to bother about performance there.

votes

Don't use multi-column keys. They get very difficult to maintain, especially if the components of the key are not human-understandable.

Use an internally generated key instead.

votes

Imagine you have a composite primary key (field1 and field2 for example) instead of just one autoincremental identifier. Clients' requirements are very changeable and after some development the client says that field2 is not compulsory and it can be nullable, it won't be possible to continue as the primary key of the table. Imagine this table is one of the most importants in your model. Then all the foreign keys should be changed if field 2 cannot be in the composite primary key. It's a nightmare changing the primary key all over the model.

As well if there is a lot of foreign keys I think is not a very good Idea to add several keys to each table just to make the link.

votes

I'm not sure there's enough information for us to make your call for you. Here are a few observations that might be helpful though.

is the primary key a clustered index? Is the table referenced by other tables through a foreign key? If yes, then you may benefit from a single-column key, because that key will appear in those other tables. This is how you would save space.

If the table is not referenced by other tables, then you would be using extra space in your table without much additional benefit. And, if this table only contains the two columns now, then you would increase the table size by 50%.

If you use an extra column for the primary key, do not forget your natural key (the two-column key). Create a unique constraint on the composite key. You still want to maintain the integrity of the real data.

votes

The decision should always be based on requirements and the intended meaning of the data. A table with only a single attribute key clearly enforces a different kind of constraint and implies that your table has a very different meaning to the same table with a multi attribute key. On the other hand adding an additional unique column would also be a waste of resources and add meaningless complexity if you don't actually need to use it anywhere.

votes

One caveat to the auto-incrementing column is that it can give a false impression of uniqueness. Sure, your identity column is always unique, but that's just a meaningless value you've attached to the table. Unless you also have a unique constraint attached to the set of columns that represent the actual semantic primary key of the table, you have no guarantee of meaningful uniqueness.