0
votes

I have a table with id and name

id  | name
----------
1   |  apple
1   |  banana
2   |  carrot
3   |  lemon
3   |  orange

In Legacy Sql, one could write a statement like

SELECT
    id,
    LAST(name) AS last_record_of_name,
FROM
  [project:table]
GROUP BY 1

and the result would be

id  | last_record_of_name
----------
1   |  banana
2   |  carrot
3   |  orange

This takes advantage of the function LAST https://cloud.google.com/bigquery/docs/reference/legacy-sql#last

Is there a similar function if using standard sql in BigQuery?

2

2 Answers

7
votes

The thing about LAST from legacy SQL is that the output is undefined for the example that you gave; it doesn't specify how "last" is determined in this case since the input table scan isn't guaranteed to have any particular order. As written, you could express this same query with ANY_VALUE from standard SQL in BigQuery, e.g.:

WITH SampleInput AS (
  SELECT 1 AS id, 'apple' AS name UNION ALL
  SELECT 1, 'banana' UNION ALL
  SELECT 2, 'carrot' UNION ALL
  SELECT 3, 'lemon' UNION ALL
  SELECT 3, 'orange'
)
SELECT
  id,
  ANY_VALUE(name) AS last_record_of_name
FROM SampleInput
GROUP BY id;
+----+---------------------+
| id | last_record_of_name |
+----+---------------------+
| 1  | apple               |
| 2  | carrot              |
| 3  | lemon               |
+----+---------------------+

I don't think that's what you're after, though; if the goal is to get the "last" value based on some criteria, such as the sort order of the name values, then you can use ARRAY_AGG with ORDER BY and LIMIT 1, e.g.:

WITH SampleInput AS (
  SELECT 1 AS id, 'apple' AS name UNION ALL
  SELECT 1, 'banana' UNION ALL
  SELECT 2, 'carrot' UNION ALL
  SELECT 3, 'lemon' UNION ALL
  SELECT 3, 'orange'
)
SELECT
  id,
  ARRAY_AGG(name ORDER BY name DESC LIMIT 1)[OFFSET(0)] AS last_record_of_name
FROM SampleInput
GROUP BY id;
+----+---------------------+
| id | last_record_of_name |
+----+---------------------+
| 1  | banana              |
| 2  | carrot              |
| 3  | orange              |
+----+---------------------+

The behavior of the query is well-defined, and it gives the desired results based on your sample input and output.

0
votes

Working solution of Array_Agg () with Python BigQuery:

!pip install -U google-cloud-bigquery

import pandas as pd

from google.cloud import bigquery

strvalue = """SELECT users ARRAY_AGG(STRUCT(session, page )) as hasComp FROM <datasetname>.<tableName> WHERE Group by users order by users limit 100 """

bigquery_client = bigquery.Client(project="")

dataset = bigquery_client.dataset("")

table = dataset.table('')

table.view_query_legacy_sql = False

query_job = bigquery_client.query(str_value)

df = query_job.to_dataframe()

print(df)