5
votes

I noticed that the first time I run a query on RedShift, it takes 3-10 second. When I run same query again, even with different arguments in WHERE condition, it runs fast (0.2 sec). Query I was talking about runs on a table of ~1M rows, on 3 integer columns.

Is this huge difference in execution times caused by the fact that RedShift compiles the query first time its run, and then re-uses the compiled code?

If yes - how to always keep this cache of compiled queries warm?

One more question: Given queryA and queryB. Let's assume queryA was compiled and executed first. How similar should queryB be to queryA, such that execution of queryB will use the code compiled for queryA?

1

1 Answers

4
votes

The answer of first question is yes. Amazon Redshift compiles code for the query and cache it. The compiled code is shared across sessions in a cluster, so the same query with even different parameters in the different session will run faster because of no overhead.

Also they recommend to use the result of the second execution of the query for the benchmark.

There is the answer for this question and details in the following link. http://docs.aws.amazon.com/redshift/latest/dg/c-compiled-code.html