The thing about LAST
from legacy SQL is that the output is undefined for the example that you gave; it doesn't specify how "last" is determined in this case since the input table scan isn't guaranteed to have any particular order. As written, you could express this same query with ANY_VALUE
from standard SQL in BigQuery, e.g.:
WITH SampleInput AS (
SELECT 1 AS id, 'apple' AS name UNION ALL
SELECT 1, 'banana' UNION ALL
SELECT 2, 'carrot' UNION ALL
SELECT 3, 'lemon' UNION ALL
SELECT 3, 'orange'
)
SELECT
id,
ANY_VALUE(name) AS last_record_of_name
FROM SampleInput
GROUP BY id;
+
| id | last_record_of_name |
+
| 1 | apple |
| 2 | carrot |
| 3 | lemon |
+
I don't think that's what you're after, though; if the goal is to get the "last" value based on some criteria, such as the sort order of the name
values, then you can use ARRAY_AGG
with ORDER BY
and LIMIT 1
, e.g.:
WITH SampleInput AS (
SELECT 1 AS id, 'apple' AS name UNION ALL
SELECT 1, 'banana' UNION ALL
SELECT 2, 'carrot' UNION ALL
SELECT 3, 'lemon' UNION ALL
SELECT 3, 'orange'
)
SELECT
id,
ARRAY_AGG(name ORDER BY name DESC LIMIT 1)[OFFSET(0)] AS last_record_of_name
FROM SampleInput
GROUP BY id;
+
| id | last_record_of_name |
+
| 1 | banana |
| 2 | carrot |
| 3 | orange |
+
The behavior of the query is well-defined, and it gives the desired results based on your sample input and output.