1
votes

I have a simple query that runs quite fast (1-2 seconds) and returns 20 rows:

select * from table where field1 = 'ABC'

However, if I add one more filter on another field, like:

select * from table where field1 = 'ABC' and field2 = 'XYZ'

the query times out (we have a limit of 30 secs). If it matters, the query is expected to return the same 20 rows in our case.

Both fields are not part of the sort key, and the table contains about 250 million rows with about 20 columns.

The question is: is it expected behaviour? I would expect the queries run times should be more or less equal. If it is expected, is there a guide what kind of queries one should avoid, or is there a tool that allows to investigate whether it's a "bad" query.

Thanks in advance!

1
It is natural to second query to take longer since there will be more comparision but I can't figure out how it takes more than 30 seconds. What do you mean by time out? Are you sending query from clickhouse-client or JDBC? - Ramazan Polat
Could you reveal the types of both columns? I suspect that select * from table where field2 = 'XYZ' takes a long time too - could you check it? - vladimir
you can try this one: select * from (select * from table where field1 = 'ABC') where field2 = 'XYZ' settings enable_optimize_predicate_expression=0; - vladimir
Thanks guys! 1) Just via REST 2) Yes, it does. 3) Works fast. I guess Denis elaborated already and all seems clear now. - Vyacheslav

1 Answers

4
votes

Yes. It's expected. In some cases CH chooses a wrong column for prewhere because it has no statistics and uses rule-based algorithm.

Try

select * from table 
prewhere field1 = 'ABC' 
where field2 = 'XYZ'