2
votes

I'm having problems when I include an UNNEST statement in the FROM clause for a query designed for Google’s BigQuery Patents dataset (patents-public-data.patents.publications). Much of the data is contained in embedded arrays. In my example below the particular field exemplifying my problem is the patent Description field, which is an array. In the patents dataset the descriptions and claims are currently only available for U.S. patents, not for patents from any other country. I’d like to have a query that uses the Description field when available, but then return NULL if that field is not available. It seems that when the UNNEST of the Description field is included (regardless of whether or not it's used in the SELECT or WHERE) for a non-US patent document, then the query will not return any results.

Below is my test case. I have two identical queries. The only difference is the inclusion or exclusion of the UNNEST(description_localized) statement in the FROM clause. The description_localized array information is not used in either the SELECT or the WHERE clauses.

Query #1, with the "UNNEST(description_localized) AS description_info" not included, returns 430,452 results.

Query #2, with the "UNNEST(description_localized) AS description_info" included, returns zero results.

Ideally, I'd like to be able to utilize the Description field in both the SELECT and WHERE clauses when it’s are available, but still return the other results when the description is not available.

It seems like there is some ‘behind-the-scenes’ JOIN happening here with the UNNEST statement that causes no results to be returned if the Description field is NULL.

Any insight into how the UNNEST works for a field, like Descriptions, where the array is NULL? Why is Query #2 returning no results? Is there a work-around solution for this?

Query #1:

#standardSQL
SELECT 
      patentsdb.publication_number AS Pat_Pub_Num,
      patentsdb.country_code AS Patent_Country_Code,
      title_info.text AS Patent_Title,
      title_info.language AS Patent_Title_Language
FROM
      `patents-public-data.patents.publications` AS patentsdb,
      UNNEST(title_localized) AS title_info,
      UNNEST(abstract_localized) AS abstract_info
      #UNNEST(description_localized) AS description_info
WHERE
      LOWER(title_info.text) LIKE '% wireless %' 
      AND 
      patentsdb.country_code != 'US'
;

Query #2

#standardSQL
SELECT 
      patentsdb.publication_number AS Pat_Pub_Num,
      patentsdb.country_code AS Patent_Country_Code,
      title_info.text AS Patent_Title,
      title_info.language AS Patent_Title_Language
FROM
      `patents-public-data.patents.publications` AS patentsdb,
      UNNEST(title_localized) AS title_info,
      UNNEST(abstract_localized) AS abstract_info,
      UNNEST(description_localized) AS description_info
WHERE
      LOWER(title_info.text) LIKE '% wireless %' 
      AND 
      patentsdb.country_code != 'US'
;
1

1 Answers

1
votes

Why is Query #2 returning no results?

That is the intended behavior of CROSS JOIN on UNNEST() when UNNEST returns no rows

Is there a work-around solution for this?

Just simply use LEFT JOIN instead of CROSS JOIN

#standardSQL
SELECT 
  patentsdb.publication_number AS Pat_Pub_Num,
  patentsdb.country_code AS Patent_Country_Code,
  title_info.text AS Patent_Title,
  title_info.language AS Patent_Title_Language
--   description_info.text,
--   description_info.language  
FROM
  `patents-public-data.patents.publications` AS patentsdb,
  UNNEST(title_localized) AS title_info,
  UNNEST(abstract_localized) AS abstract_info
  LEFT JOIN UNNEST(description_localized) AS description_info
WHERE LOWER(title_info.text) LIKE '% wireless %' 
  AND patentsdb.country_code != 'US'