Cassandra as session store under heavy load

Question

I would like to use Cassandra to store session related informations. I do not have real HTTP session - it's different protocol, but the same concept.

Memcached would be fine, but I would like to additionally persist data.

Cassandra setup:

non replicated Key Space
single Column Family, where key is session ID and each column within row stores single key/value - (Map<String,Set<String,String>>)
column TTL = 10 minutes
write CL = ONE
read CL = ONE
2.000 writes/s
5.000 reads/s

Data example:

session1:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
.....
   {propXXX:val3, TTL:10 min}
},
session2:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
},
......
sessionXXXX:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
}

In this case consistency is not a problem, but the performance could be, especially disk IO.

Since data in my session leaves for short time, I would like to avoid storing it on hard drive - except for commit log.

I have some questions:

If column expires in Memtable before flushing it to SSTable, will Cassandra anyway store such column in SSTable (flush it to HDD)?
Replication is disabled for my Key Space, in this case storing such expired column in SSTable would not be necessary, right?
Each CF hat max 10 columns. In such case I would enable row cache and disable key cache. But I am expecting my data to be still available in Memtable, in this case I could disable whole cache, right?
Any Cassandra configuration hints for such session-store use case would be really appreciated :)

Thank you, Maciej

you say you want to persist data, but also want to TTL it after 10 minutes. — sdolgy
This is important process and I would like to make sure that is does not break — Maciej Miklas

Maciej Miklas Maciej Miklas · Accepted Answer · 2011-11-04T12:47:17

Here is what I did - and it works fine:

Set replication_factor to 1 - means disable replication
Set gc_grace to 0 - means delete columns on first compaction. This is fine, since data is not replicated.
Increase memtable size and decrease cache size. We want to read data from memtable and omit cache - flushing data to HDD and reading it again from HDD into cache.
Additionally commit log can be disabled - durable_writes=false

In this setup, data will be read from memtable and cache will be not used. Memtable can allocate enough heap to keep my data until it expires or even longer.

After flushing data to SSTable, compaction will immediately remove expired rows, since gc_grace=0.

Cassandra as session store under heavy load

2 Answers