Redshift Disk Full on Multiple Join Query

Question

I have two tables. The first we'll call "main" and it includes six columns (a-f) that map one-to-one in another table called "lookup". My query is as follows:

with
a_options as (*a options*),
...,
f_options as (*f options*),
other as (*query on base1*),
main as (select ... from a_options,..., f_options, other),
lookup as (*query on base2*)
select *,
  a_lookup.value,
  b_lookup.value,
  c_lookup.value,
  d_lookup.value,
  e_lookup.value,
  f_lookup.value
from main
inner join lookup as a_lookup on a = a_lookup.key
inner join lookup as b_lookup on b = b_lookup.key
inner join lookup as c_lookup on c = c_lookup.key
inner join lookup as d_lookup on d = d_lookup.key
inner join lookup as e_lookup on e = e_lookup.key
inner join lookup as f_lookup on f = f_lookup.key

I am running this on a Redshift cluster of 16 dc1.large nodes. Across the whole cluster, I am at about 60% disk space utilization, which means I should have no more than 240 GB of memory and 1.02 TB disk space available (this is a high estimate since some of this is reserved for Redshift internal use).

As I mentioned, each of these joins is one-to-one, so the result of the query should be no larger than the size of main. When main is 4,496 rows, the query executes in ~15 seconds and uses little to no disk space. At 7,304 rows (main grows in discrete increments) however, the query fails after about 5 minutes on a disk full error.

CloudWatch shows that the error was caused by one of the nodes hitting storage capacity. Storage does not grow evenly across nodes throughout the query, and the query fails precisely when the first node hits 100%, so it is not as if the query is consuming all available disk space in the cluster. Still, it shouldn't be getting anywhere close to capacity. Has anyone ever seen this behavior? Why does my query explode like this?

The database is almost exclusively made of fresh tables, so there aren't any fragments. I also don't have a whole lot of control over the design of the database so I can't refactor my tables to optimize for performance (I realize having a six joins in a query is probably indicative of a poor design). I am just trying to understand why Redshift would soak up so much storage.

Edit: main and lookup are both derived tables and each is defined in a CTE. There are several different options for a-f. Main is generated by first calculating every distinct combination of a-f (cross join) and then joining that with another set of data. This other set of data (92 rows) is also a CTE that is a filtered and aggregated version of another table (196,154,352 rows, call this base1). For each combination of a-f, the there will be about 30 different rows in main (this is why main increases in size discretely, it depends on how many options you have for a-f). Again, the query starts soaking up disk space when main is at about 7,000 rows (avg. 2.5 options for a-f). Lookup is just a filtered and aggregated version of another table, which I'll call base2 (down from 172,867 rows to 1,241 rows).

So base1 and base2 are the only real tables in this query. Main is derived from the cross join of a-f joined with another CTE that is derived from base1, and lookup is derived directly from base2. See updates to query above.

The intermediate steps of the query are taking more space than what is evident just by looking at the tables alone. This is my guess. — Tim Biegeleisen
Yeah, that was my thought too. I tried the query with one join, two joins, three joins, etc and it gave me a result less than the size of main for everything. I also broke it up into explicit nested queries and I still ran into the same error. I don’t know a lot about the Redshift/Postgres join operation “under the hood”, but something doesn’t seem right. — Joshua Dotson
Do you have indices on the join columns? In addition to possibly speeding up the query, it might reduce the space complexity. — Tim Biegeleisen
@TimBiegeleisen: Redshift does not support indexes: docs.aws.amazon.com/redshift/latest/dg/… — a_horse_with_no_name
The timings vs the cluster size vs the table sizes doesn't feel right at all! i run much bigger versions of this type of thing on 1 x dc2.large, You should probably upgrade to dc2 by the way. — Jon Scott

Jon Scott Jon Scott · Accepted Answer · 2017-10-24T07:14:42

Please can you try, once i had a similar problem and splitting like this fixed it.

with part1 as 
(
select *,
a_lookup.value,
b_lookup.value,
c_lookup.value
from main
inner join lookup as a_lookup on main.a = a_lookup.key
inner join lookup as b_lookup on main.b = b_lookup.key
inner join lookup as c_lookup on main.c = c_lookup.key
)
select p1.*,
  d_lookup.value,
  e_lookup.value,
  f_lookup.value
from part1 as p1
inner join lookup as d_lookup on p1.d = d_lookup.key
inner join lookup as e_lookup on p1.e = e_lookup.key
inner join lookup as f_lookup on p1.f = f_lookup.key
;

Redshift Disk Full on Multiple Join Query

1 Answers