I am designing an application which will accept data/events from customer facing systems persist them for audit and act as source to replay messages in case downstream systems needed a correction in any data feed.
I don't plan to do much analytics on this data ( which will be done in a downstream system ). But I am expected to persist this data and let run adhoc queries.
Few characteristics of my system
(1) 99 % write - 1 % read (2) High write throughput (Roughly 30000 Events a second , each event having roughly 100 attributes in it) (3) Dynamic nature of data. Cant conform to fixed schema.
These characteristics makes me think of Apache cassandra as an option either with widerow feature or map to store my attributes .
I did few samples with single node and Kundera ORM to write events to map , and get a maximum write throughput of 1500 events a second / thread . I can scale it out with more threads and cassandra nodes.
But, is it close to what I should be getting from your experience ? Few of the benchmarks available on net looks confusing .. ( I am on cassandra 2.0, with Kundra ORM 2.13)