3
votes

The question is pretty simple, as i want to know how cassandra reads inside the partition? Does it loads the whole partition in memory from disk?

What will be the effect if the partition size is very big?

Is the complexity of reading data in partition is O(Log(N)) in partition (where N being the total number of rows in partition) as it uses the sorted map?

SCENARIO :

Lets say there are 100000 rows identified by unique clustering keys per partition. So if i provide both partition key and clustering key in fetch query so, will it load the complete partition into memory in order to traverse through all the clustering keys to find the row specified?

3

3 Answers

1
votes

No, it does not read the whole partition. It has an index structure. The operating system will cache read and written files, if it has the memory. Therefore if a node has plenty of memory, eventually all the data on that node will be in memory.

0
votes

As far as I know Cassandra maintains index and index summary files for only partition key not for clustering key so In order to traverse through all the clustering keys for given partition It will load the data into the memory.

and there is another theory that It will perform the binary search on disk.

0
votes

All I was able to find is the row cache based on reference in official wiki.

Not sure if this is same as what was mentioned in previous posts as row index or is it something different.

After reading the wiki page I assume that Cassandra does read the whole partition from SSTable, in case of a row cache miss. When the partition sits in memtable the algorithm is different: Cassandra performs a binary search in these cases.

It would be good if someone from this thread could confirm it.