2
votes

I want to select rows that have more thank k values in a repeated field. (consider for example selecting user that have more than 3 email addresses)

In Standard SQL I know I can use

SELECT * FROM dataset.users
WHERE array_length(email_address) > 3

But what is the way to do this in BigQuery legacy SQL?

2

2 Answers

7
votes

No need for a subquery; you should be able to filter with OMIT RECORD IF directly:

SELECT *
FROM dataset.users
OMIT RECORD IF COUNT(email_address) <= 3;

Do you mind commenting on why you want to use legacy SQL, though? If you encountered a problem with standard SQL I'd like to understand what it was so that we can fix it. Thanks!

0
votes

Counting Values in a repeated field in BigQuery

BigQuery Legacy SQL

SELECT COUNT(email_address) WITHIN RECORD AS address_count
FROM [dataset.users]

If you want then to count output rows - you can use below

SELECT COUNT(1) AS rows_count 
FROM (
  SELECT COUNT(email_address) WITHIN RECORD AS address_count
  FROM [dataset.users]
)
WHERE address_count> 3