How would you structure your entity model for storing arbitrary key/value data with different data types?

Question

I keep coming across scenarios where it will be useful to store a set of arbitrary data in a table using a per-row key/value model, rather than a rigid column/field model. The problem is, I want to store the values with their correct data type rather than converting everything to a string. This means I have to choose either a single table with multiple nullable columns, one for each data type, or a set of value tables, one for each data type. I'm also unsure as to whether I should use full third normal form and separate the keys into a separate table, referencing them via a foreign key from the value table(s), or if it would be better to keep things simple and store the string keys in the value table(s) and accept the duplication of strings.

Old/bad:

This solution makes adding additional values a pain in a fluid environment because the table needs to be modified regularly.

MyTable
============================
ID    Key1    Key2    Key3
int   int     string  date
----------------------------
1     Value1  Value2  Value3
2     Value4  Value5  Value6

Single Table Solution

This solution allows simplicity via a single table. The querying code still needs to check for nulls to determine which data type the field is storing. A check constraint is probably also required to ensure only one of the value fields contains non-null data.

DataValues
=============================================================
ID    RecordID    Key    IntValue    StringValue    DateValue
int   int         string int         string         date
-------------------------------------------------------------
1     1           Key1   Value1      NULL           NULL
2     1           Key2   NULL        Value2         NULL
3     1           Key3   NULL        NULL           Value3
4     2           Key1   Value4      NULL           NULL
5     2           Key2   NULL        Value5         NULL
6     2           Key3   NULL        NULL           Value6

Multiple-Table Solution

This solution allows for more concise purposing of each table, though the code needs to know the data type in advance as it needs to query a different table for each data type. Indexing is probably simpler and more efficient because there are less columns that need indexing.

IntegerValues
===============================
ID    RecordID    Key    Value
int   int         string int
-------------------------------
1     1           Key1   Value1
2     2           Key1   Value4

StringValues
===============================
ID    RecordID    Key    Value
int   int         string string
-------------------------------
1     1           Key2   Value2
2     2           Key2   Value5

DateValues
===============================
ID    RecordID    Key    Value
int   int         string date
-------------------------------
1     1           Key3   Value3
2     2           Key3   Value6

How do you approach this problem? Which solution is better?

Also, should the key column be separated into a separate table and referenced via a foreign key or be should it be kept in the value table and bulk updated if for some reason the key name changes?

Thomas Thomas · Accepted Answer · 2010-06-10T15:49:00

First, relational databases were not designed to store arbitrary data. The fundamentals of the relational model revolve around getting specifications for the nature of the data that will be stored.

Second, what you are suggesting is a variant of an Entity-Attribute-Value (EAV). The problem with EAVs comes in data integrity, reporting, performance and maintenance. They have their place but they are analogous to drugs: used in limited quantity and narrow circumstances they can beneficial; too much will kill you.

Writing queries against an EAV is a bear. Thus, if you are going to use an EAV, the only circumstance under which I've seen them be successful is to restrict their use such that no one is permitted to write a query that filters for a specific attribute. I.e., no one is ever permitted to write a query akin to Where AttributeName = 'Foo'. That means you can never filter, sort, calculate on nor place a specific attribute in a specific place on a report. The EAV data is just a bag of categorized data which can spewed out en masse on a report but that's it. I've even seen people implement EAVs as Xml blobs.

Now, if you use the EAV in this light and since it is just a blob of data, I would use the single table approach. A significant benefit to the single table approach is that you can add a check constraint that ensures that you have one and only one value in the IntValue, StringValue or DateValue columns. The nulls will not cost you much and if this is just wad of data, it will not make any difference in performance. Furthermore, it will make your queries simpler in that you can use a simple case statement to return the String, Integer or DateValue.

I can see many problems with the multi-table approach not the least of which is that there is nothing to prevent the same attribute from having multiple types of values (e.g. a row in IntegerValues and a row in StringValues). In addition, to get the data, you will always have to use three left joins which will make your queries more cumbersome to write.

The cost of EAVs is discipline and vigilance. It requires discipline in your development team to never, ever, under any circumstances write a report or query against a specific attribute. Developers will get a lot of pressure from management to "just this one time" write something that filters for a specific attribute. Once you go down the dark path forever does it dominate your development and maintenance. The EAVs must remain a wad of data and nothing more. If you cannot maintain that kind of discipline in your development team, then I would not implement an EAV. I would require specification for any new column in the interest of avoiding a maintenance nightmare later. Once users do want to filter, sort, calculate on or put an attribute in a special place on a report, the attribute must become a first class column. If you are able to maintain discipline on their use, EAVs can work well for letting users store whatever information they want and deferring the time when you need to get specifications on the data elements until users want to use the attribute in a way mentioned earlier.

How would you structure your entity model for storing arbitrary key/value data with different data types?

6 Answers