0
votes

I'm currently migrating PostgreSQL code from our existing DWH to new Redshift DWH and few queries are not compatible. I have a table which has id, start_week, end_week and orders_each_week in a single row. I'm trying to generate a sequential series between the start_week and end_week so that I separate rows for each week between the give timeline.

Eg., This how it is present in the table

+----+------------+----------+------------------+
| ID | start_week | end_week | orders_each_week |  
+----+------------+----------+------------------+
|  1 |      3     |     5    |        10        | 
+----+------------+----------+------------------+

This is how I want to have it

+----+------+--------+
| ID | week | orders |
+----+------+--------+
| 1  | 3    | 10     |
+----+------+--------+
| 1  | 4    | 10     |
+----+------+--------+
| 1  | 5    | 10     |
+----+------+--------+

The code below is throwing error.

SELECT 
id,
generate_series(start_week::BIGINT, end_week::BIGINT) AS demand_weeks
FROM client_demand
WHERE createddate::DATE >= '2021-01-01'

[0A000][500310] Amazon Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.; [01000] Function "generate_series(bigint,bigint)" not supported.

So basically I am trying to find a sequential series between two numbers and I couldn't find any solution and any help here is really appreciated. Thank you

2

2 Answers

1
votes

Gordon Linoff has shown a very common method for doing this and this approach has the advantage that the process isn't generating "rows" that don't already exist. This can make this faster than approaches that generate data on the fly. However, you need to have a table with about the right number of rows laying around and this isn't always the case. He also shows that this number series needs to be cross joined with your data to perform the function you need.

If you need to generate a large number of numbers in a series not using an existing table there are a number of ways to do this. Here's my go to approach:

WITH twofivesix AS (
SELECT
    p0.n
    + p1.n * 2
    + p2.n * POWER(2,2)
    + p3.n * POWER(2,3)
    + p4.n * POWER(2,4)
    + p5.n * POWER(2,5)
    + p6.n * POWER(2,6)
    + p7.n * POWER(2,7)
    as n
  FROM
    (SELECT 0 as n UNION SELECT 1) p0,
    (SELECT 0 as n UNION SELECT 1) p1,
    (SELECT 0 as n UNION SELECT 1) p2,
    (SELECT 0 as n UNION SELECT 1) p3,
    (SELECT 0 as n UNION SELECT 1) p4,
    (SELECT 0 as n UNION SELECT 1) p5,
    (SELECT 0 as n UNION SELECT 1) p6,
    (SELECT 0 as n UNION SELECT 1) p7
),
fourbillion AS (
  SELECT (a.n * POWER(256, 3) + b.n * POWER(256, 2) + c.n * 256 + d.n) as n 
  FROM twofivesix a, 
       twofivesix b, 
       twofivesix c,
       twofivesix d
)
SELECT ...

This example makes a whole bunch of numbers (4B) but you can extend or reduce the number in the series by changing the number of times the tables are cross joined and by adding where clauses (as Gordon Linoff did). I don't expect you need a list anywhere close to this long but wanted to show how this can be used to make series that are very long. (You can also write with in base 10 if that makes more sense to you.)

So if you have a table with a more rows that you need number then this can be the fastest method but if you don't have such a table or table lengths vary over time you may want this pure SQL approach.

0
votes

Among the many Postgres features that Redshift does not support is generate_series() (except on the master node). You can generate one yourself.

If you have a table with enough rows in Redshift, then I find that this approach works:

with n as (
      select row_number() over () - 1 as n
      from client_demand cd
     )
select cd.id, cd.start_week + n.n as week, cd.orders_each_week
from client_demand cd join
     n
     on n.n <= (end_week - start_week);

This assumes that you have a table with enough rows to generate enough numbers for the on clause. If the table is really big, then add something like limit 100 in the n CTE to limit the size.

If there are only a handful of values, you can use:

select 0 as n union all
select 1 as n union all
select 2 as n