0
votes

I have a date dimension table containing all dates and another table containing the value of items at specific dates. E.g (a) Date_Dim table

|Full_Date  |  
|-----------|
| ....      |
|1-jan-2021 |
|2-Jan-2021 |
|3-jan-2021 |
| ...       |

(b) Item_value table

|P_Date      | ITEM  | Value  |
|-----------:|:------|-------:|
|20-Dec-2020 |AA1    |9       |
|1-jan-2021  |AA1    |10      |
|1-jan-2021  |AA2    |100     |
| ...        | ...   | ...    |

I am trying to build a fact table containing the latest value of every item in the item_value table for every date in the date_dim table. i.e the value of the items every day. e.g

|Full_date   | ITEM   | Value |
|-----------:|-------:|------:|
|31-Dec-2020 |AA1     | 9     |
|31-Dec-2020 |AA2     | null  |
|1-Jan-2021  |AA1     | 10    |
|1-Jan-2021  |AA2     | 100   |
|2-Jan-2021  |AA1     | 10    |
|2-Jan-2021  |AA2     | 100   |
|3-Jan-2021  |AA1     | 10    |
|3-Jan-2021  |AA2     | 100   |
|4-Jan-2021  |AA1     | 10    |
|4-Jan-2021  |AA2     | 100   |

How can this query be built, please? I have tried the following but not working

select full_date,p_date,item,value from dim_date left outer join item_value on full_date=p_date;

Not sure whether max(p_date) over (partition by ...) will work.

Thank you

4

4 Answers

0
votes

You can use a partitioned outer join and then aggregate:

WITH date_dim ( full_date ) AS (
  SELECT DATE '2020-12-31' + LEVEL - 1 AS full_Date
  FROM   DUAL
  CONNECT BY DATE '2020-12-31' + LEVEL - 1 <= DATE '2021-01-04'
)
SELECT item,
       full_date,
       MAX( value ) KEEP ( DENSE_RANK LAST ORDER BY p_date ) AS value
FROM   date_dim d
       LEFT OUTER JOIN item_value i
       PARTITION BY ( i.item )
       ON ( d.full_date >= i.p_date )
GROUP BY item, full_date

Which, for the sample data:

CREATE TABLE item_value ( P_Date, ITEM, Value ) AS
SELECT DATE '2020-12-20', 'AA1',   9 FROM DUAL UNION ALL
SELECT DATE '2021-01-01', 'AA1',  10 FROM DUAL UNION ALL
SELECT DATE '2021-01-01', 'AA2', 100 FROM DUAL;

Outputs:

ITEM | FULL_DATE | VALUE
:--- | :-------- | ----:
AA1  | 31-DEC-20 |     9
AA1  | 01-JAN-21 |    10
AA1  | 02-JAN-21 |    10
AA1  | 03-JAN-21 |    10
AA1  | 04-JAN-21 |    10
AA2  | 31-DEC-20 |  null
AA2  | 01-JAN-21 |   100
AA2  | 02-JAN-21 |   100
AA2  | 03-JAN-21 |   100
AA2  | 04-JAN-21 |   100

Note: You do not need to store the date_dim dimension table; it can be generated on-the-fly and will reduce the need to perform (expensive) IO operations reading the table from the hard disk.

db<>fiddle here

0
votes

You may simple add a validity interval for you ITEM table using the analtical function LEAD

select  
P_DATE,
lead(P_DATE-1,1,(select max(full_date) from date_dim)) over (partition by ITEM order by P_DATE) P_DATE_TO,
ITEM, VALUE
from item_value
;

P_DATE              P_DATE_TO           ITE      VALUE
------------------- ------------------- --- ----------
20.12.2020 00:00:00 31.12.2020 00:00:00 AA1          9
01.01.2021 00:00:00 04.01.2021 00:00:00 AA1         10
01.01.2021 00:00:00 04.01.2021 00:00:00 AA2        100

In some case this is enough for your use case as you can query the VALUE of a specific ITEM on a given date with

select VALUE from item_value_hist h where ITEM = 'AA2' 
                                          and <query_date> BETWEEN h.P_DATE and h.P_DATE_TO

Note, that the validity interval is inclusive, as we for P_DATE_TO subtract one day from the adjacent P_DATE. You should take some care is the DATEs have a time component.

If you want the ITEM per DAY overview you must first add the missing early history with the VALUE of NULL

select 
(select min(full_date) from date_dim) P_DATE,  min(P_DATE)-1  P_DATE_TO, ITEM, null VALUE
from item_value
group by ITEM
having min(P_DATE) > (select min(full_date) from date_dim)

P_DATE              P_DATE_TO           ITE VALUE
------------------- ------------------- --- -----
31.12.2020 00:00:00 31.12.2020 00:00:00 AA2 

Than simple outer join to your dimension table matching all day from your validity interval

with item as (
select  
P_DATE,
lead(P_DATE-1,1,(select max(full_date) from date_dim)) over (partition by ITEM order by P_DATE) P_DATE_TO,
ITEM, VALUE
from item_value
union all
select 
/* add the missing early history without a VALUE */
(select min(full_date) from date_dim) P_DATE,  min(P_DATE)-1  P_DATE_TO, ITEM, null VALUE
from item_value
group by ITEM
having min(P_DATE) > (select min(full_date) from date_dim)
)
select dt.full_date, item.ITEM, item.VALUE from item
join date_dim dt
on dt.full_date between item.P_DATE and item.P_DATE_TO
order by item.ITEM, dt.full_date

FULL_DATE           ITE      VALUE
------------------- --- ----------
31.12.2020 00:00:00 AA1          9
01.01.2021 00:00:00 AA1         10
02.01.2021 00:00:00 AA1         10
03.01.2021 00:00:00 AA1         10
04.01.2021 00:00:00 AA1         10
31.12.2020 00:00:00 AA2           
01.01.2021 00:00:00 AA2        100
02.01.2021 00:00:00 AA2        100
03.01.2021 00:00:00 AA2        100
04.01.2021 00:00:00 AA2        100
0
votes

Two steps:

  1. Cross join dates and items. If you don't have an item table (which you should), join distinct items from your item_value table.
  2. Get the value in the FROM clause with OUTER APPLY or in the SELECT clause with a subquery using FETCH FIRST ROW ONLY.

The query:

select 
  d.full_date,
  i.item,
  (
    select iv.value
    from Item_value iv
    where iv.item = i.item
    and iv.p_date <= d.full_date
    order by iv.p_date desc
    fetch first row only
  ) as value
from dim_date d
cross join (select distinct item from item_value) i
order by d.full_date, i.item;
0
votes

You can generate the full list of dates and items using cross join followed by a left join to bring in the existing values. Then you can use last_value() or lag() to fill in the values:

select d.p_date, i.item, 
       coalesce(v.value,
                lag(v.value ignore nulls) over (partition by i.item order by d.p_date)
               ) as value
from date_dim d cross join
     (select distinct iv.item from item_value iv) i left join
     item_value iv
     on iv.p_date = d.p_date and iv.item = i.item;

You can also do this using a join by adding an "end" date to the values table:

select d.p_date, i.item, 
       coalesce(v.value,
                lag(v.value ignore nulls) over (partition by i.item order by d.p_date)
               ) as value
from date_dim d cross join
     (select distinct iv.item from item_value iv) i left join
     (select iv.*,
             lead(p_date) over (partition by item order by p_date) as next_p_date
      from item_value iv
     ) iv
     on i.item = iv.item and
        d.p_date >= iv.p_date and
        (iv.next_p_date is null or d.p_date < iv.next_p_date);