10
votes

The DynamoDB Wikipedia article says that DynamoDB is a "key-value" database. However, calling it a "key-value" database completely misses an extremely fundamental feature of DynamoDB, that of the sort key: Keys have two parts (partition key and sort key) and items with the same partition key can be efficiently retrieved together sorted by the sort key.

Cassandra also has exactly the same sorting-items-inside-a-partition feature (which it calls "clustering key"), and the Cassandra Wikipedia article uses the term wide column store to describe it. However, while this term "wide column" is better than "key-value", it is still somewhat inappropriate because it describes the more general situation where an item can have a very large number of unrelated columns - not necessarily a sorted list of separate items.

So my question is whether there is a more appropriate term that can describe the data model of a database like DynamoDB and Cassandra - databases which like a key-value store can efficiently retrieve items for individual keys, but can also efficiently retrieve items sorted by the key or just a part of it (DynamoDB's sort key or Cassandra's clustering key).

1
I find "Key-Key-Value" to capture the Partition / Ordering / Value nature of this modelTzach Livyatan
I'm voting to close this question as off-topic because it doesn't seem related to coding.Charles
Why does it need to be related to "coding"? It doesn't have the tag C++ or Java or any other programming language, so it's not about coding in any programming language. It has the Cassandra et al. tags - indicating that it's about that software. Is there a better stackexchange site you think better suites questions about software? And note that this question already has +2 score. It's not a really bad question, apparently.Nadav Har'El
My suggestion "Key-Sortable-Value"TomerSan

1 Answers

4
votes

Before CQL was introduced, Cassandra adhered more strictly the wide column store data model, where you only had rows identified by a row key and containing sorted key/value columns. With the introduction of CQL, rows became known as partitions and columns could optionally be grouped in to logical rows via clustering keys.

Even until Cassandra 3.0, CQL was simply an abstraction on top of the original thrift data model and there was no concept of CQL rows within the storage engine. They were just a sorted set of columns with a compound key consisting of the concatenated values of the clustering keys. More details are given in this article. Now there is native support for CQL in the storage engine, which allows CQL data models to be stored more efficiently.

However, if you think of a CQL row as a logical grouping of columns within the same partition, Cassandra still could be considered a wide column store. In any case, there isn't, to my knowledge, another well established term to describe this kind of database.