1
votes

I have the following Orders table in PostgreSQL Database

Order_date | Revenue
--------------------
2020-10-01 | 10

2020-10-02 | 5

2020-10-03 | 10

2020-10-04 | 5

2020-10-05 | 10

I need to return the cumulative sum of revenue for every order_date from last 2 days including the revenue for that order date. I am using the following query

SELECT o1.order_date,  
       Sum(o2.revenue) as Revenue_sum 
FROM   orders o1  
       JOIN orders o2   
         ON o1.order_date BETWEEN o2.order_date AND o2.order_date + 2  
GROUP  BY o1.order_date 
ORDER  BY o1.order_date  

and it returns following result

Order_date | Revenue_sum 
------------------------
2020-10-01 | 10

2020-10-02 | 15

2020-10-03 | 25

2020-10-04 | 20

2020-10-05 | 25

As per logical order of the query processing following steps will be performed

  1. The 'JOIN' will first form a Cartesian product by performing a cross join, so every row from o1 will join to every row of o2.
  2. Then the qualifier condition in 'ON' clause will select only the rows that satisfy the condition
  3. From the selected rows, the revenue will be summed up by each group from the GROUP BY clause (o1.order_date)

According to the execution steps above, I am trying to visualize the processing steps of my query.
Step 1 would be a Cross join as shown below
Step 2 would be qualification based on the condition in 'ON'. I am having trouble in visualizing what rows will be selected from step 1 based on the condition specified in 'JOIN' and how
Step 3 would then group the rows and sum the revenue

1. Cartesian Product

    o1.order_date | o2.order_date | o2.revenue
   -------------------------------------------
    2020-10-01    | 2020-10-01    | 10  
    2020-10-01    | 2020-10-02    | 5
    2020-10-01    | 2020-10-03    | 10
    2020-10-01    | 2020-10-04    | 5
    2020-10-01    | 2020-10-05    | 10
    2020-10-02    | 2020-10-01    | 10  
    2020-10-02    | 2020-10-02    | 5
    2020-10-02    | 2020-10-03    | 10
    2020-10-02    | 2020-10-04    | 5
    2020-10-02    | 2020-10-05    | 10
    2020-10-03    | 2020-10-01    | 10  
    2020-10-03    | 2020-10-02    | 5
    2020-10-03    | 2020-10-03    | 10
    2020-10-03    | 2020-10-04    | 5
    2020-10-03    | 2020-10-05    | 10
    2020-10-04    | 2020-10-01    | 10  
    2020-10-04    | 2020-10-02    | 5
    2020-10-04    | 2020-10-03    | 10
    2020-10-04    | 2020-10-04    | 5
    2020-10-04    | 2020-10-05    | 10
    2020-10-05    | 2020-10-01    | 10  
    2020-10-05    | 2020-10-02    | 5
    2020-10-05    | 2020-10-03    | 10
    2020-10-05    | 2020-10-04    | 5
    2020-10-05    | 2020-10-05    | 10

2. Qualification based on 'ON' condition. What rows will be selected from step 1 above?

3
All rows that match the condition, as applied individually to each row, are selected. This might be easier to “visualize the process” when writing it out using a smaller set of data, eg. (1, 2, 3) with a condition of a.i >= b.i and a.i <= b.i + 1 (which is the expanded form of a.i between b.i and b.i + 1). - user2864740
If you wrote the query I can't see what difficulty you having with understanding how it works. - shawnt00
I arrived at the final query using various combination but the logical processing flow was still not clear to me. Thanks to the comments and posts of the experts here, that logical processing is now clear to me - Shrestha

3 Answers

2
votes

Once the cartesian product has been performed, every value of r1.date will be compared with the range of r2.date defined based on the condition you have provided (o1.order_date BETWEEN o2.order_date AND o2.order_date + 2). Remember for every value of o2.order_date, this date range will be redefined.

Example: When o1.order_date='2020-10-01':

  • It will compare if o1.order_date lies within o2.order_date range between '2020-10-01' and '2020-10-03', the condition evaluates to True, and this row is selected from the cartesian product.
  • Next time, o2.order_date range becomes '2020-10-02' and '2020-10-04', now order_date='2020-10-01' doesn't lie within this range and hence this condition evaluates to false. Therefore, only 1 row (mentioned in previous step) from the cartesian product is selected for o1.order_date='2020-10-01'.

The above steps are repeated unless all the rows in your cartesian product have been evaluated, and only the ones that satisfy the given date range condition will be selected to go in the group by clause for the aggregation of revenue.

Based on the above steps, following rows will be selected to go to the group-by clause:

o1.order_date | o2.order_date | o2.revenue
-------------------------------------------
2020-10-01    | 2020-10-01    | 10  
2020-10-02    | 2020-10-01    | 10  
2020-10-02    | 2020-10-02    | 5
2020-10-03    | 2020-10-01    | 10  
2020-10-03    | 2020-10-02    | 5
2020-10-03    | 2020-10-03    | 10
...
1
votes

For the purpose on demonstration and demonstration only. This does NOT claim to be the physical process Postgres follows, but should allow a visualization. Start with your Cartesian Product and extend by 2 columns. The predicate o2.order_date+2. And a Truth table evaluating your ON predicate (o1.order_date BETWEEN o2.order_date AND o2.order_date + 2). With that then you select only those rows having a truth value result True.

+---------------+---------------+-----------------+-----------------------------+
| o1.order_date | o2.order_date | o2.order_date+2 | od1 >= od2 and od1 <= od2+2 |
+---------------+---------------+-----------------+-----------------------------+
| 2020-10-01    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-01    | 2020-10-02    | 2020-10-04      | false                       |
| 2020-10-01    | 2020-10-03    | 2020-10-05      | false                       |
| 2020-10-01    | 2020-10-04    | 2020-10-06      | false                       |
| 2020-10-01    | 2020-10-05    | 2020-10-07      | false                       |
| 2020-10-02    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-02    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-02    | 2020-10-03    | 2020-10-05      | false                       |
| 2020-10-02    | 2020-10-04    | 2020-10-06      | false                       |
| 2020-10-02    | 2020-10-05    | 2020-10-07      | false                       |
| 2020-10-03    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-03    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-03    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-03    | 2020-10-04    | 2020-10-06      | false                       |
| 2020-10-03    | 2020-10-05    | 2020-10-07      | false                       |
| 2020-10-04    | 2020-10-01    | 2020-10-03      | false                       |
| 2020-10-04    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-04    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-04    | 2020-10-04    | 2020-10-06      | true                        |
| 2020-10-04    | 2020-10-05    | 2020-10-07      | false                       |
| 2020-10-05    | 2020-10-01    | 2020-10-03      | false                       |
| 2020-10-05    | 2020-10-02    | 2020-10-04      | false                       |
| 2020-10-05    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-05    | 2020-10-04    | 2020-10-06      | true                        |
| 2020-10-05    | 2020-10-05    | 2020-10-07      | true                        |
+---------------+---------------+-----------------+-----------------------------+

With result

+---------------+---------------+-----------------+-----------------------------+
| o1.order_date | o2.order_date | o2.order_date+2 | od1 >= od2 and od1 <= od2+2 |
+---------------+---------------+-----------------+-----------------------------+
| 2020-10-01    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-02    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-02    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-03    | 2020-10-01    | 2020-10-03      | true                        |
| 2020-10-03    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-03    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-04    | 2020-10-02    | 2020-10-04      | true                        |
| 2020-10-04    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-04    | 2020-10-04    | 2020-10-06      | true                        |
| 2020-10-05    | 2020-10-03    | 2020-10-05      | true                        |
| 2020-10-05    | 2020-10-04    | 2020-10-06      | true                        |
| 2020-10-05    | 2020-10-05    | 2020-10-07      | true                        |
+---------------+---------------+-----------------+-----------------------------+

Finally collect each date and sum the revenue values.

0
votes

Assuming your dates would always be in sequence, in steps of one day at a time, you may use:

SELECT
    Order_date,
    SUM(Revenue) OVER (ORDER BY Order_date
                       ROWS BETWEEN 2 PRECEDING AND
                            CURRENT ROW) AS Revenue_sum
FROM orders
ORDER BY
    Order_date;

screen capture from demo link below

Demo