0
votes

Can I, on a simple query, obtain the last NOT NULL value from a table like this?

ID | Name  | Inserted_at | Custom.Value1 | Custom.Value2
1  | Allan | 2017-08-01  | NULL          | NULL
1  | NULL  | 2017-08-03  | Value1        | NULL
1  | NULL  | 2017-08-05  | Value2        | Value3
2  | Jones | 2017-08-02  | NULL          | NULL

The value that I expect to return is some like:

1  | Allan | 2017-08-05  | Value2        | Value3
2  | Jones | 2017-08-02  | NULL          | NULL

I know that Updates on BigQuery are nearly impossible and a naive MAX/GROUP/ORDER BY does not seems to be right.

Anyone have some idea how to solve this?

Thanks!

2
This is something I've always wanted an Aggregate function for. It can be done in MSSQL with Window Functions, but it's colossally inefficient compared to an Aggregate. I don't know if anything like Window Functions exist for BigQuery.Chris Berger
why value1 and not value2?xQbert
@ChrisBerger Actually, exists Window Functions for BigQuery, but I don't know how to solve this with it...Allan Sene
@xQbert , my bad... just fixed :)Allan Sene
@AllanSene Then either of the below examples should help, I think. The one that uses first_value() is how I would do this with a window function, but I hate the need for DISTINCT and the performance hit. The ARRAY_AGG() function I had never seen before. Looks like exactly what you need, I wish I had known it existed. I wonder if it avoids the performance problems of window functions?Chris Berger

2 Answers

4
votes

Here is an example:

#standardSQL
SELECT
  ID,
  ARRAY_AGG(Name IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Name,
  ARRAY_AGG(Custom.Value1 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value1,
  ARRAY_AGG(Custom.Value2 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value2
FROM YourTable
GROUP BY ID;

You can try this with sample data:

#standardSQL
WITH YourTable AS (
  SELECT 1 AS ID, 'Allan' AS Name, DATE '2017-08-01' AS Inserted_at, STRUCT(CAST(NULL AS STRING) AS Value1, CAST(NULL AS STRING) AS Value2) AS Custom UNION ALL
  SELECT 1, NULL, DATE '2017-08-03', STRUCT('Value1' AS Value1, NULL AS Value2) UNION ALL
  SELECT 1, NULL, DATE '2017-08-05', STRUCT('Value2' AS Value1, 'Value3' AS Value2) UNION ALL
  SELECT 2, 'Jones', DATE '2017-08-02', STRUCT(NULL AS Value1, NULL AS Value2)
)
SELECT
  ID,
  ARRAY_AGG(Name IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Name,
  ARRAY_AGG(Custom.Value1 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value1,
  ARRAY_AGG(Custom.Value2 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value2
FROM YourTable
GROUP BY ID;
1
votes

You can use first_value():

select distinct id,
       first_value(name) over
           (partition by id
            order by (case when name is not null then 1 else 2 end, inserted_at desc)
           ) as name,
       max(inserted_at) as inserted_at,
       first_value(Custom.Value1) over
           (partition by id
            order by (case when Custom.Value1 is not null then 1 else 2 end, inserted_at desc)
           ) as Value1,
       first_value(Custom.Value2) over
           (partition by id
            order by (case when Custom.Value2 is not null then 1 else 2 end, inserted_at desc)
           ) as Value2
from t;