I am using Redshift and need an alternative for a correlated subquery. I am getting the correlated subquery not supported error. However, for this particular exercise of trying to identify all sales transactions made by the same customer within a given hour from the originating transaction, I am not sure a traditional left join would work either. I.e., the query is dependent on the context or current value from the parent select. I have also tried something similar using row_number() window function but again, need a way to window / partition on a date range - not just customer_id.
The overall goal is to find the first sales transaction for a given customer id, then find all subsequent transactions made within 60 minutes of the first transaction. This logic will continue on for the remainder of the transactions for the same customer (and ultimately all customers in the database). That is, once the initial 60 minute window has been established from the time of the first transaction, a second 60 minute window would begin at the end of the first 60 minute window, and all transactions within the second window would also be identified and combined and then repeat for the remainder of transactions.
The output would list the first transaction id that started the 60 minute window, then the other subsequent transaction ids that were made within the 60 minute window. The 2nd row would display the first transaction id made by the same customer in the next 60 minute window (again, the first transaction post the first 60 minute window would be the start of the second 60 minute window) and then the subsequent transactions also made within the second 60 minute window.
The query example in its most basic form looks like the query below:
select
s1.customer_id,
s1.transaction_id,
s1.order_time,
(
select
s2.transaction_id
from
sales s2
where
s2.order_time > s1.order_time and
s2.order_time <= dateadd(m,60,s1.order_time) and
s2.customer_id = s1.customer_id
order by
s2.order_time asc
limit 1
) as sales_transaction_id_1,
(
select
s3.transaction_id
from
sales s3
where
s3.order_time > s1.order_time and
s3.order_time <= dateadd(m,60,s1.order_time) and
s3.customer_id = s1.customer_id
order by
s3.order_time asc
limit 1 offset 1
) as sales_transaction_id_2,
(
select
s3.transaction_id
from
sales s4
where
s4.order_time > s1.order_time and
s4.order_time <= dateadd(m,60,s1.order_time) and
s4.customer_id = s1.customer_id
order by
s4.order_time asc
limit 1 offset 1
) as sales_transaction_id_3
from
(
select
sales.customer_id,
sales.transaction_id,
sales.order_time
from
sales
order by
sales.order_time desc
) s1;
For example, if a customer made the following transactions:
customer_id transaction_id order_time
1234 33453 2017-06-05 13:30
1234 88472 2017-06-05 13:45
1234 88477 2017-06-05 14:10
1234 99321 2017-06-07 8:30
1234 99345 2017-06-07 8:45
The expected output would be as:
customer_id transaction_id sales_transaction_id_1 sales_transaction_id_2 sales_transaction_id_3
1234 33453 88472 88477 NULL
1234 99321 99345 NULL NULL
Also, it appears Redshift does not support lateral joins which seems to further restrict the options at my disposal. Any help would be greatly appreciated.
s1so the other two legs of the query can be replaced byEXISTS(...)(getting rid of the ugly LIMIT 1, too) - wildplasserNOT EXISTS(...) to the twoON` conditions. - wildplasser