I am testing cql/cassandra 1.2 and the python-cql library on a vm with 2GB ram. I have a table with a compound index (wide row). When running queries against a single node I am getting about 10x worse performance than mysql. Requests are serial with no concurrency but I am interested in the speed of a single request.
- Most importantly, Is there anything I can do to optimize querying wide rows (specifically this query)?
- Are these numbers reflective of cassandra vs mysql performance in a single request situation?
- Could my limited ram/vm be making this big of a difference?
- Would multi-node cassandra / partitioned mysql be closer than 10x?
- Am I doing something horribly wrong?
Test code:
"""
CREATE TABLE foo_bars (
foo_id text,
bar_id bigint,
content text,
PRIMARY KEY (foo_id, bar_id)
)
WITH CLUSTERING ORDER BY (bar_id DESC);
"""
#content is up to 64k text and te number of bar columns in a foo row will be ever growing but will probably never reach over 2million
t1 = time.time()
for i in range(1, 1000):
sql_query = "SELECT * FROM foo_bars WHERE foo_id IN(%s) ORDER BY id DESC LIMIT 40" % random_foo_ids
result = db_cursor.execute(sql_query)
t2 = time.time()
print "Sql time = %s" % str(t2 - t1)
t1 = time.time()
for i in range(1, 1000):
cql_query = "SELECT * FROM foo_bars WHERE foo_id IN(%s) LIMIT 40" % radom_foo_ids
result = cassandra_cursor.execute(cql_query)
t2 = time.time()
print "Cql time = %s" % str(t2 - t1)
Sql time = 4.2
Cql time = 58.7
Thanks In Advance!