2
votes

I want to understand exactly what will improve my performance if I decide to go with following strategy for partition

Lets say I have a table for songs and I want to define artists as the partition key. This table is going to grow gradually. Today I have 25 artists and 5 songs each for those 25 artists (so total 125 rows). But over a period of time i foresee 500 artists and 5 songs per artists (so total 2500) rows. I want to make artist id as partition key because in CQL it is necessary to mention partition key in where clause and in my ui this is the unique value based on which i can show those 5 songs.

Also, what if I start with 2 cassandra nodes today and eventually grow to 4 nodes and then later 10 nodes. Can I continue to have the same partition key as I grow?

Here is my table structure :

ArtistId (partition key)  |  SongId  |  Song
--------------------------------------------
1                         | 1        |  abc
1                         | 2        |  cde
1                         | 3        |  fgh
2                         | 4        |  ijk
2                         | 5        |  lmn
1                         | 6        |  opq
1                         | 7        |  rst
1
What kinds of queries are you going to have?jny
my query would be like select * from songs where artistid = 1Hitesh

1 Answers

3
votes

Also, what if I start with 2 cassandra nodes today and eventually grow to 4 nodes and then later 10 nodes. Can I continue to have the same partition key as I grow?

Yes, you can keep your partition key.

I want to understand exactly what will improve my performance if I decide to go with following strategy for partition

Clarifying primary keys can be a single column, or compound, when compound can have a partition key and clustering key[s].

Since you are saying partition key over artist, that would be your row key and I am assuming song would be your clustering key.

Partition keys are used to distribute across different nodes and your clustering keys the order in which they are stored.

Per the cql documentation:

all the rows sharing the same partition key (even across table in fact) are stored on the same physical node

That would be very efficient to search, since doesn't require a quorum on all nodes, instead it would be find them faster.