2
votes

I have created a race table in my cloud project named race. The table contains one integer field and one string field:

Then I try to perform a join query with publicdata.samples.natality and my race table:

SELECT COUNT(*),natality.year FROM [publicdata:samples.natality] natality INNER JOIN [regal-cider-369:ab.race] race ON natality.child_race = race.id GROUP BY 2 LIMIT 50

and I get the following error:

Fields child_race and id have different types int32 and int64, and therefore can't be compared as join attributes

It can be a limitation in the Bigquery engine, and it's ok.

However, the problem is that I can't find any place in the API where I can specify whether I want a field to be int32 or int64 when I create my race table (The only integer data type I've seen up to now in the documentation are STRING, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (https://developers.google.com/bigquery/docs/reference/v2/tables#schema.fields.type).

Did I miss something? or is the "natality" sample table using a data type which is not available in the API?

1

1 Answers

3
votes

All integers in BigQuery should be int64s (BigQuery calls them INTEGER). If they're not, it is a bug. It is possible that some of the sample tables have values where the underlying type is int32, however.

Are you only seeing this on the natality table? If so, we should fix it.

A workaround is to coerce the values to an INTEGER ... you'll need to use a subquery, however. As in:

SELECT COUNT(*), natality.year FROM (
    SELECT year, INTEGER(child_race) as child_race
    FROM [publicdata:samples.natality]) AS natality 
INNER JOIN 
    (SELECT INTEGER(id) as id FROM [regal-cider-369:ab.race]) AS race 
ON natality.child_race = race.id 
GROUP BY 2 LIMIT 50