0
votes

does anyone know what is best in terms of performance for a query like this in BigQuery?

SELECT lastUpdated
FROM `test_db.test_dataset.Customers`
ORDER BY lastUpdated DESC
LIMIT 1

It works fine and gets the last result by date, however Im a bit concerned about the limit in BigQuery:

Query complete (0.3 sec elapsed, 2.9 MB processed)

Is there any way to run a similar query without spending that much processing?

2

2 Answers

1
votes

If you're concerned about "spending" please consider only 2 things matters in a BigQuery query:

  1. How many columns are scanned? You can optimize by selecting only the column that is definitely needed for a query. Plus, if a column is RECORD type, selecting only sub-column will also help.
  2. How many rows are scanned? Note that "LIMIT/WHERE/ORDER BY" doesn't change how many rows will be scanned in BigQuery. 2 things that can reduce rows scanned are partitioning and clustering. In your case, clustering by lastUpdated is definitely going to help when your table grows in the future.
0
votes

I would suggest using MAX() instead:

SELECT MAX(lastUpdated)
FROM `test_db.test_dataset.Customers`;

I find that ORDER BY in BigQuery is not always optimized for the parallel environment. In any case, I think MAX() is clearer for what your query is doing.