0
votes

We have created table with below schema:

create table test_table(col_1 text, col_2 text, col_3 text, col_4 text, col_5 text, col_6 text, col_7 text, PRIMARY KEY (col_1, col_2, col_3, col_4, col_5));

this table contains nearly billions of records.(huge record count)

and tried to query like below,

select * from test_table where col_1='value' and col_2='value'; --> getting results

but when we tried like below,

select * from test_table where col_1='value' and col_3='value' allow filtering; --> not getting results

select * from test_table where col_1='value' and col_4='value' allow filtering; --> not getting results

we are getting below error:

"ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}"

After i got above error, i have extended the timeout parameters from 5 seconds to 60 minutes in cassandra.yaml configuration file.

then result came, but taking very long time with 50 minutes of execution time.

Can anyone suggest me to solve the "ReadTimeout:" issue without extending the configuration?

2
What happens when you simply select where col1 = 'XXXXX'? Does it time out or finish? My thought is that what you're doing is similar to what I've just asked you to run and would expect it to time out. My initial hunch is you may have a very large partition? What happens if you do a "nodetool tablehistograms" on that table, or "nodetool tablestats" on that table? - Jim Wartnick
when i run query on col1 i am getting data in seconds. no timeout. - Sujit

2 Answers

0
votes

what are your partition and cluster keys? it timeouts cause data size is too much to be processed in given timeout period. try looking into pagination & secondary indexes. though secondary indexes might reduce wright performance.

0
votes

By using "allow filtering" you are performing a full table scan, which times out and thats the reason for the error you are getting.

You need to change your partition / clustering keys in a way that you can run your queries without the "allow filtering" argument.

When you perform only the given queries above, you could think about duplicating the data into 3 tables:

create table test_table_1(col_1 text, col_2 text, col_3 text, col_4 text, col_5 text, col_6 text, col_7 text, PRIMARY KEY (col_1, col_2));
create table test_table_2(col_1 text, col_2 text, col_3 text, col_4 text, col_5 text, col_6 text, col_7 text, PRIMARY KEY (col_1, col_3));
create table test_table_3(col_1 text, col_2 text, col_3 text, col_4 text, col_5 text, col_6 text, col_7 text, PRIMARY KEY (col_1, col_4));

Your queries would then be:

select * from test_table_1 where col_1='value' and col_2='value';
select * from test_table_2 where col_1='value' and col_3='value';
select * from test_table_3 where col_1='value' and col_4='value';

Remember: In cassandra you design your tables around your queries.