2
votes

I have a sample data as below and wanting to get a desired o/p, please help me with some idea.

I want the o/p of prev_diff_value of the 3rd,4th row to be 2015-01-01 00:00:00 instead of 2015-01-02 00:00:00.

with dat as (
            select 1 as id,'20150101 02:02:50'::timestamp as dt union all
            select 1,'20150101 03:02:50'::timestamp union all
            select 1,'20150101 04:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:51'::timestamp union all
            select 1,'20150103 02:02:50'::timestamp union all
            select 2,'20150101 02:02:50'::timestamp union all
            select 2,'20150101 03:02:50'::timestamp union all
            select 2,'20150101 04:02:50'::timestamp union all
            select 2,'20150102 02:02:50'::timestamp union all
            select 1,'20150104 02:02:50'::timestamp
            )-- select * from dat
   select id , dt , lag(trunc(dt)) over(partition by id order by dt asc) prev_diff_value
   from dat
  order by id,dt desc
O/P : 
   id   dt                    prev_diff_value
   1    2015-01-04 02:02:50   2015-01-03 00:00:00
   1    2015-01-03 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:51   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-01 00:00:00
2
Hi, could you explain better what you want to see in the prev_diff_value column? lag function is taking the previous one as reference so it is working correctly, you ask for that ro redshift. Why do you want to go back to one step? Maybe you could group by year-month-day hh-mm ignoring seconds so 2015-01-02 02:02:50 and 2015-01-02 02:02:51 will be considered the same?MiloBellano
Ok.. I know it's returning correct result but the value in prev-diff-value I am expecting is the previous day or maybe earlier day rather than the previous rowFact

2 Answers

3
votes

As I understand you want to get the previous different date for each timestamp within id partition. I would then apply lag against the unique combination of id and date and join back to the original dataset like this:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
,dat_unique_lag as (
    select *, lag(date) over(partition by id order by date asc) prev_diff_value
    from (
        select distinct id,trunc(dt) as date
        from dat
    )
)
select *
from dat
join dat_unique_lag
using (id)
where trunc(dat.dt)=dat_unique_lag.date
order by id,dt desc;

However, this is not super performant. If nature of your data is that you have a limited number of timestamps for the same day you might just extend your lag with a conditional statement like this:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select id, dt,
case 
    when lag(trunc(dt)) over(partition by id order by dt asc)=trunc(dt)
    then case 
        when lag(trunc(dt),2) over(partition by id order by dt asc)=trunc(dt)
        then case
            when lag(trunc(dt),3) over(partition by id order by dt asc)=trunc(dt)
            then lag(trunc(dt),4) over(partition by id order by dt asc)
            else lag(trunc(dt),3) over(partition by id order by dt asc)
            end
        else lag(trunc(dt),2) over(partition by id order by dt asc)
        end
    else lag(trunc(dt)) over(partition by id order by dt asc)
end as prev_diff_value
from dat
order by id,dt desc;

Basically, you look at the previous record and if it doesn't suit you then you look back to the record before that one and so on until you find the correct record or run out of your statement depth. Here it looks until the 4th record back.

1
votes

Here's a different way of looking at the problem that's not very efficient, but sort of fun anyway.

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select distinct
dat.id
,dat.dt
,last_value(dat2.d) over (partition by dat.id, dat.dt order by dat2.d asc rows between unbounded preceding and unbounded following) as prev_diff_value
from dat
left join (
    select distinct
    id
    ,trunc(dt) as d
    from dat) dat2 on dat.id = dat2.id and trunc(dat.dt) > dat2.d
order by 1,2,3;

This will draw the distinct pairs of id's and dates out and rejoin them onto the dataset only where the joined dates are earlier than the row in question. Then, the last_value function will take last value per row and the distinct removes all irrelevant rows from the output. I know this question is a few years old- but I stumbled across it and had fun with it.