I have a table where each row contains all the fields that changed during some event, and a flag associated with each field to flag if the field was updated. For simplicity I only show here the "status" field, but they are several other fields as well. In cases where a given field was not modified by the event, the field is set to null and so is the flag.
+----+---------------------+--------+---------------------+
| id | date | status | flag_changed_status |
+----+---------------------+--------+---------------------+
| 1 | 2020-01-03 19:32:17 | TODO | 1 |
| 1 | 2020-01-08 15:46:07 | WIP | 1 |
| 1 | 2020-01-08 15:53:53 | | | //this line was generated because another field changed
| 1 | 2020-01-08 15:56:53 | | | //this line was generated because another field changed
| 1 | 2020-01-08 16:02:31 | Done | 1 |
+----+---------------------+--------+---------------------+
My goal is to replace the field values for the rows where the field was not changed with the last value it had when the flag was equal to one, e.g get :
+----+---------------------+--------+---------------------+
| id | date | status | flag_changed_status |
+----+---------------------+--------+---------------------+
| 1 | 2020-01-03 19:32:17 | TODO | 1 |
| 1 | 2020-01-08 15:46:07 | WIP | 1 |
| 1 | 2020-01-08 15:53:53 | WIP | |
| 1 | 2020-01-08 15:56:53 | WIP | |
| 1 | 2020-01-08 16:02:31 | Done | 1 |
+----+---------------------+--------+---------------------+
I understand that I want to use the last_value
analytical function in Bigquery, and I tried :
SELECT ID_DEMANDE, date, status,
last_value(status) OVER (ORDER BY flag_changed_status, DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as current_status, flag_changed_status
FROM table ORDER BY id, DATE
The idea was that by using the flag in the order by function, the rows where the flag was set to null would be put in first, and then the last_value(status) would be the last value where flag_changed_status was set to 1
But this can only work if I use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, because the ORDER BY
clause will be processed before the window frame clause (rows between ...), thus for the rows where flag_changed_status is null, after the order by is processed, the current row number is 0, so the last value between unbounded preceding and current row is always null.
Is there any way to first run the ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and then the ORDER BY, so that last_value(status) will return the last value preceding the current row where the flag was set to one ? Or is there something much simpler, still using analytical functions to allow me to complete all the different fields in one query ?
Edit : I really want to copy the status that was set the last time the flag was set, even if this status is null, that is why I am trying to use the flag in the order by. That is if the initial table is :
+----+---------------------+--------+---------------------+
| id | date | status | flag_changed_status |
+----+---------------------+--------+---------------------+
| 1 | 2020-01-03 19:32:17 | TODO | 1 |
| 1 | 2020-01-08 15:46:07 | null | 1 |
| 1 | 2020-01-08 15:53:53 | null | null |
| 1 | 2020-01-08 15:56:53 | null | null |
| 1 | 2020-01-08 15:57:53 | WIP | 1 |
| 1 | 2020-01-08 15:58:53 | null | null |
| 1 | 2020-01-08 16:02:31 | Done | 1 |
+----+---------------------+--------+---------------------+
I would need:
+----+---------------------+--------+---------------------+
| id | date | status | flag_changed_status |
+----+---------------------+--------+---------------------+
| 1 | 2020-01-03 19:32:17 | TODO | 1 |
| 1 | 2020-01-08 15:46:07 | null | 1 |
| 1 | 2020-01-08 15:53:53 | null | null | // we copy the last status where the flag was 1, and it is null
| 1 | 2020-01-08 15:56:53 | null | null |
| 1 | 2020-01-08 15:57:53 | WIP | 1 |
| 1 | 2020-01-08 15:58:53 | WIP | null | //only this line changes
| 1 | 2020-01-08 16:02:31 | Done | 1 |
+----+---------------------+--------+---------------------+
But it seems to be too complicated, so I will just replace all the nulls where the flag is set to 1 with a custom status, and then a simple last_value(status IGNORE NULLS) as @gordon-linoff was suggesting will provide almost the desired result