I have two tables. The first we'll call "main" and it includes six columns (a-f) that map one-to-one in another table called "lookup". My query is as follows:
with
a_options as (*a options*),
...,
f_options as (*f options*),
other as (*query on base1*),
main as (select ... from a_options,..., f_options, other),
lookup as (*query on base2*)
select *,
a_lookup.value,
b_lookup.value,
c_lookup.value,
d_lookup.value,
e_lookup.value,
f_lookup.value
from main
inner join lookup as a_lookup on a = a_lookup.key
inner join lookup as b_lookup on b = b_lookup.key
inner join lookup as c_lookup on c = c_lookup.key
inner join lookup as d_lookup on d = d_lookup.key
inner join lookup as e_lookup on e = e_lookup.key
inner join lookup as f_lookup on f = f_lookup.key
I am running this on a Redshift cluster of 16 dc1.large nodes. Across the whole cluster, I am at about 60% disk space utilization, which means I should have no more than 240 GB of memory and 1.02 TB disk space available (this is a high estimate since some of this is reserved for Redshift internal use).
As I mentioned, each of these joins is one-to-one, so the result of the query should be no larger than the size of main. When main is 4,496 rows, the query executes in ~15 seconds and uses little to no disk space. At 7,304 rows (main grows in discrete increments) however, the query fails after about 5 minutes on a disk full error.
CloudWatch shows that the error was caused by one of the nodes hitting storage capacity. Storage does not grow evenly across nodes throughout the query, and the query fails precisely when the first node hits 100%, so it is not as if the query is consuming all available disk space in the cluster. Still, it shouldn't be getting anywhere close to capacity. Has anyone ever seen this behavior? Why does my query explode like this?
The database is almost exclusively made of fresh tables, so there aren't any fragments. I also don't have a whole lot of control over the design of the database so I can't refactor my tables to optimize for performance (I realize having a six joins in a query is probably indicative of a poor design). I am just trying to understand why Redshift would soak up so much storage.
Edit: main and lookup are both derived tables and each is defined in a CTE. There are several different options for a-f. Main is generated by first calculating every distinct combination of a-f (cross join) and then joining that with another set of data. This other set of data (92 rows) is also a CTE that is a filtered and aggregated version of another table (196,154,352 rows, call this base1). For each combination of a-f, the there will be about 30 different rows in main (this is why main increases in size discretely, it depends on how many options you have for a-f). Again, the query starts soaking up disk space when main is at about 7,000 rows (avg. 2.5 options for a-f). Lookup is just a filtered and aggregated version of another table, which I'll call base2 (down from 172,867 rows to 1,241 rows).
So base1 and base2 are the only real tables in this query. Main is derived from the cross join of a-f joined with another CTE that is derived from base1, and lookup is derived directly from base2. See updates to query above.