I'm trying to select records from one table based on whether an item in a repeated field is located in a column in another table. I've been able to do this when explicitly listing the items I am testing for in my code, but not when selecting from another table. Let me demonstrate using the trigrams dataset:
Let's say that I wanted to select all records that had an appearance during certain years. But I don't just want just the data from those years - I want all the data associated with those records. If I just wanted the data for a few years I could do something like this (and this works):
SELECT ngram, first, second, third, fourth, fifth, cell.value, cell.volume_count,
SOME(cell.value in ('1800', '1801')) WITHIN RECORD AS valid
FROM [publicdata:samples.trigrams]
HAVING valid
However, instead of coding '1800' and '1801' into my query, I have a table years
that contains a set of years I am interested in. I would like this to work:
SELECT ngram, first, second, third, fourth, fifth, cell.value, cell.volume_count,
SOME(cell.value in (SELECT year_as_str FROM [mydataset.years])) WITHIN RECORD AS valid
FROM [publicdata:samples.trigrams]
HAVING valid
This doesn't work because bigquery requires that semi-joins be part of a WHERE
or HAVING
clause.
So I try to rearrange (back to the first query):
SELECT ngram, first, second, third, fourth, fifth, cell.value, cell.volume_count
FROM [publicdata:samples.trigrams]
HAVING SOME(cell.value in ('1801', '1802')) WITHIN RECORD
This results in an error Encountered " "WITHIN" "WITHIN "" ... Was expecting <EOF>
So now without the WITHIN RECORD
:
SELECT ngram, first, second, third, fourth, fifth, cell.value, cell.volume_count
FROM [publicdata:samples.trigrams]
HAVING SOME(cell.value in ('1801', '1802'))
This results in an error SELECT clause has mix of aggregations '...' and fields '...' without GROUP BY clause
But I'm not aggregating! So now I move the filter into a WHERE
:
SELECT ngram, first, second, third, fourth, fifth, cell.value, cell.volume_count
FROM [publicdata:samples.trigrams]
WHERE SOME(cell.value in ('1801', '1802'))
And this tells me that Invalid function name: SOME
. What?!
Is there a way to get the behavior I am looking for with BigQuery?