1
votes

I want to use string (almost around 6 to 7 characters approximately) as unique key in composite primary key.

I have googled if using string in primary key will hit performance or not and found that no it will not as Cassandra use hashing for searching on unique key.(someone please confirm this)

So I want to know what techniques does Cassandra use to search on unique key , if it use hashing then which hashing algorithm it use ? And also want to know what Cassandra does in case of collision as in hashing there is always possibility of collision.

Cassandra use murmur3 hashing for working with partition key. Does it use same for searching unique key ? Then what about collision.

2

2 Answers

1
votes

Since you mentioned composite key so I am assuming that you have a PK like this, PRIMARY KEY(PartionKey, StringVal) where StringVal is 6 to 7 chars and I guess you want to know how C* efficiently gets to record for this PK. If this is your question then answer lies in how C* stores data. In this example all the data for a given partition key is stored as one physical row using 'StringVal' as the sorting order. So if you have say 1 million unique 'StringVal' for a given PartionKey value, then they all of them will be stored as one physical row (on the disk) on a node (determined by hash of Partionkey) and sorted in the default ascending order of ‘StringVal’. All the columns in PK other than partition key are called ‘clustering’ columns as they decide clustering order. So in this example, first column of composite key is partition key and second column is clustering column which decides clustering order for all the records for a partition key. Now if you want to get a specific PK record, since C* stores the offset for the Primary keys in index files (-Index.db files for a column family) getting to a specific record for a PK is very efficient as it involves seek to that location. Also this allows C to do efficient range queries as well for e.g. you could get a slice of physical row corresponding to the partition key by specifying a range of ‘StringVal’ like ‘nnn’ > sv < ‘mmm’ which in your case will be lexical order comparison. But the point is that since its in specific order on the disk and C* has offset to the various records corresponding to values of ‘StringVal’ , it can do very efficient seeks.

0
votes

Partition key value is hashed and then used to target the node that owns that token range. There is no such thing as collision since hash value is always the same for one value. If you use the same value partition key you will write to the existing partition which will lead to update. Insert and update are both the same action called upsert. Hope it helps.