5
votes

How can I tell the LAG function to get the last "not null" value?

For example, see my table bellow where I have a few NULL values on column B and C. I'd like to fill the nulls with the last non-null value. I tried to do that by using the LAG function, like so:

case when B is null then lag (B) over (order by idx) else B end as B,

but that doesn't quite work when I have two or more nulls in a row (see the NULL value on column C row 3 - I'd like it to be 0.50 as the original).

Any idea how can I achieve that? (it doesn't have to be using the LAG function, any other ideas are welcome)

A few assumptions:

  • The number of rows is dynamic;
  • The first value will always be non-null;
  • Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.

Thanks

enter image description here

5
Itzik Ben-Gan wrote a blog on a that problem: sqlmag.com/sql-server/how-previous-and-next-condition. Unfortunatley SQL Server doesn't support the IGNORE NULLS option in LAST_VALUE, then it's simple: LAST_VALUE(B IGNORE NULLS) OVER (ORDER BY idx). - dnoeth

5 Answers

7
votes

You can do it with outer apply operator:

select t.id,
       t1.colA,
       t2.colB,
       t3.colC 
from table t
outer apply(select top 1 colA from table where id <= t.id and colA is not null order by id desc) t1
outer apply(select top 1 colB from table where id <= t.id and colB is not null order by id desc) t2
outer apply(select top 1 colC from table where id <= t.id and colC is not null order by id desc) t3;

This will work, regardless of the number of nulls or null "islands". You may have values, then nulls, then again values, again nulls. It will still work.


If, however the assumption (in your question) holds:

Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.

there is a more efficient solution. We only need to find the latest (when ordered by idx) values. Modifying the above query, removing the where id <= t.id from the subqueries:

select t.id,
       colA = coalesce(t.colA, t1.colA),
       colB = coalesce(t.colB, t2.colB),
       colC = coalesce(t.colC, t3.colC) 
from table t
outer apply (select top 1 colA from table 
             where colA is not null order by id desc) t1
outer apply (select top 1 colB from table 
             where colB is not null order by id desc) t2
outer apply (select top 1 colC from table 
             where colC is not null order by id desc) t3;
4
votes

You could make a change to your ORDER BY, to force the NULLs to be first in your ordering, but that may be expensive...

lag(B) over (order by CASE WHEN B IS NULL THEN -1 ELSE idx END)

Or, use a sub-query to calculate the replacement value once. Possibly less expensive on larger sets, but very clunky.
- Relies on all the NULLs coming at the end
- The LAG doesn't rely on that

COALESCE(
    B,
    (
        SELECT
            sorted_not_null.B
        FROM
        (
            SELECT
                table.B,
                ROW_NUMBER() OVER (ORDER BY table.idx DESC)   AS row_id
            FROM
                table
            WHERE
                table.B IS NOT NULL
        )
           sorted_not_null
        WHERE
           sorted_not_null.row_id = 1
    )
)

(This should be faster on larger data-sets, than LAG or using OUTER APPLY with correlated sub-queries, simply because the value is calculated once. For tidiness, you could calculate and store the [last_known_value] for each column in variables, then just use COALESCE(A, @last_known_A), COALESCE(B, @last_known_B), etc)

1
votes

if it is null all the way up to the end then can take a short cut

declare @b varchar(20) = (select top 1 b from table where b is not null order by id desc);
declare @c varchar(20) = (select top 1 c from table where c is not null order by id desc); 
select is, isnull(b,@b) as b, insull(c,@c) as c 
from table;
1
votes
Select max(diff) from(
Select 
    Case when lag(a) over (order by b) is not null
    Then (a -lag(a) over (order by b)) end as diff 
     From <tbl_name> where
    <relevant conditions>
    Order by b) k

Works fine in db visualizer.

-3
votes
UPDATE table 
SET B = (@n := COALESCE(B , @n))
WHERE B is null;