I'm having problems when I include an UNNEST statement in the FROM clause for a query designed for Google’s BigQuery Patents dataset (patents-public-data.patents.publications). Much of the data is contained in embedded arrays. In my example below the particular field exemplifying my problem is the patent Description field, which is an array. In the patents dataset the descriptions and claims are currently only available for U.S. patents, not for patents from any other country. I’d like to have a query that uses the Description field when available, but then return NULL if that field is not available. It seems that when the UNNEST of the Description field is included (regardless of whether or not it's used in the SELECT or WHERE) for a non-US patent document, then the query will not return any results.
Below is my test case. I have two identical queries. The only difference is the inclusion or exclusion of the UNNEST(description_localized) statement in the FROM clause. The description_localized array information is not used in either the SELECT or the WHERE clauses.
Query #1, with the "UNNEST(description_localized) AS description_info" not included, returns 430,452 results.
Query #2, with the "UNNEST(description_localized) AS description_info" included, returns zero results.
Ideally, I'd like to be able to utilize the Description field in both the SELECT and WHERE clauses when it’s are available, but still return the other results when the description is not available.
It seems like there is some ‘behind-the-scenes’ JOIN happening here with the UNNEST statement that causes no results to be returned if the Description field is NULL.
Any insight into how the UNNEST works for a field, like Descriptions, where the array is NULL? Why is Query #2 returning no results? Is there a work-around solution for this?
Query #1:
#standardSQL
SELECT
patentsdb.publication_number AS Pat_Pub_Num,
patentsdb.country_code AS Patent_Country_Code,
title_info.text AS Patent_Title,
title_info.language AS Patent_Title_Language
FROM
`patents-public-data.patents.publications` AS patentsdb,
UNNEST(title_localized) AS title_info,
UNNEST(abstract_localized) AS abstract_info
#UNNEST(description_localized) AS description_info
WHERE
LOWER(title_info.text) LIKE '% wireless %'
AND
patentsdb.country_code != 'US'
;
Query #2
#standardSQL
SELECT
patentsdb.publication_number AS Pat_Pub_Num,
patentsdb.country_code AS Patent_Country_Code,
title_info.text AS Patent_Title,
title_info.language AS Patent_Title_Language
FROM
`patents-public-data.patents.publications` AS patentsdb,
UNNEST(title_localized) AS title_info,
UNNEST(abstract_localized) AS abstract_info,
UNNEST(description_localized) AS description_info
WHERE
LOWER(title_info.text) LIKE '% wireless %'
AND
patentsdb.country_code != 'US'
;