Relational database design - surrogate keys vs natural keys in context of query speed, ORM and app development

Question

Suppose I have a set of 'model' entities, and a set of difficulty levels. Each model has a certain percentage success rate for a given day on a given difficulty level.

A model entity has a name, which is both unique and immutable under any circumstances, so it makes a natural primary key. A difficulty level is described by its name only (easy, normal, etc). Difficulty levels are very, very unlikely to change, though it's possible a new one could be added. A success rate record is uniquely identified by the model it pertains to, the difficulty level, and the date.

Here's the most trivial db design for this scenario: db design with natural keys

In this design, 'name' is the primary key for the table 'models' and is represented by a VARCHAR(20) field. Likewise, a VARCHAR(20) field 'name' is the primary key for the table 'difficulty_levels' (a lookup table). In the 'success_rates' table, 'model_name' is a foreign key referencing the 'name' field in the 'model' table, and 'difficulty_level' is a foreign key referencing the 'name' field in the 'difficulty_levels' table. The fields 'model_name', 'difficulty_level' and 'date' make up a composite primary key for the 'success_rates' table.

The most used queries would be:

getting all success rates for a certain model, difficulty level, and date period
getting the most/least successful models for a certain period and difficulty level.

Now, my question is - do I need to add surrogate primary keys to the 'models' and 'difficulty_levels' tables? I guess storing int values as opposed to varchar values in the foreign key fields of 'success_rates' takes up less space, and maybe the queries will be faster (just my wild guess, not sure about that)?

The problem I see with surrogate keys is that they have zero relevance to the business logic itself. I'm planning on using a mini-ORM (most likely Dapper), and without the surrogate keys I can operate on classes that very cleanly represent the entities I'm working with. Now, if I add surrogate keys, I'll have to add 'Id' properties to my classes, and I'm really against adding a database storage implementation like that to a class that can be used anywhere in the app, not even in connection with a database storage. I could add proxy storage classes with an Id property, but that adds another level of complexity. Plus the fact that the 'Id' property won't be readonly (so the ORM can set the ids after saving the entity to the database) means that it would be possible to accidentally set it to a random/invalid value.

I'm not very familiar with ORM's and I have zero knowledge of Dapper, so correct me if I was wrong in any of these points.

What would be the best approach here?

Natural keys are great ..... IF they're really unique, and stable (not changing all the time), and hopefully not made up from 10 columns..... Personally, I tend to use surrogate keys almost all the time, since those guarantee all the good properties: small, unique, never changing. The users don't need to ever see those technical keys at all - but they offer the best performance and integrity, in my opinion. — marc_s
@marc_s What about their representation in the application classes? I've described the problems I'm seeing with the surrogate keys in the last paragraph. The users of the application certainly won't have to deal with these keys and their implementation, but the users of the classes (developers) will have to. — sdds
To me, the benefits on the database side of having a small, static, unique surrogate key far outweighs these concerns. So you have an Id property - yes, and what's the problem with that!?!? — marc_s
@marc_s As I said, I'd rather keep this implementation detail out of the class representing the Model entity. Besides the fact that the class in not always used in the context of db operations, it means a storage implementation detail resides in business logic code. What if I decide to drastically change the methods of storage? Or implement several repositories, a relational db and some other? An Id property in a Model class makes no sense then. Then again, an Id property could be set to an invalid state. Natural keys (name) could be made readonly and set only once, when the object is created. — sdds
I think its pretty generally accepted that a surrogate key should be used. Most DBs are optimised for INT keys, and they are certainly more efficient than varchar(30). Also you'd better hope that the never change; if either name changes, then a) a large part of the index on sucesses is going to be deleted and recreated (leading to locks and a potentially large delay) and b) in a multi-user system other users will have to reload from the DB as any update with a now non-existent name will fail. The cost of having a undisplayed Id property is actually very small. — simon at rcl

David Aldridge David Aldridge · Accepted Answer · 2014-01-31T14:04:42

The problem I see with surrogate keys is that they have zero relevance to the business logic itself.

That is actually the benefit, not the problem. The key's value is the immutable identifier of the entity, regardless of whatever else changes. I really doubt that there is any widely used ORM that cannot easily make use of it, as the alternative (cascading changes to a key value to every child record) is so much more difficult to implement.

I'd also suggest that you add a value to your difficulty levels that allows the hierarchy of increasing difficulty to be represented in the data model, otherwise "harder than" or "easier than" cannot be robustly represented.

Relational database design - surrogate keys vs natural keys in context of query speed, ORM and app development

1 Answers