I need to process about 4,000 cassandra queries. I convert each query ResultSet into a generator to keep the memory footprint low. Within each row of the generator, I'm only concerned with a few fields of about 50 present.
I know that I can't filter directly on value fields in CQL but does the DataStax Python Cassandra driver have something built in that does this? or would it make more sense to just do this when I build the generator i.e
def make_gen(response):
for row in response:
yield row.value.field1, row.value.filed2
I am issuing direct queries at the moment but will move to model based approach later with concurrent queries and prepared statements. The code that is issuing the request is very basic
sess = connect_cas(env)
for user in users:
q = 'select * from table ' + \
'where key1 = {} and '.format(key_1) + \
'key2 = {} and '.format(key_2) + \
'sample_time > {} '.format(t1) + \
'sample_time < {} '.format(t2)
resp_gen = make_gen(sess.execute(q)) # just a yield json.loads(Row.value)
for resp in resp_gen:
if field in resp:
// process data from this field
I only care about rows where this "field" is present. I've since updated my generator to only yield data when this condition is true, however, if there is something built into the DataStax driver that does this more efficiently, at 4,000 queries the savings will add up.
Model-based approach? Or direct query? - Alex Ott