2
votes

I doing some meta programming with bigquery and noticed something I didn't expect.

I'm using this query:

SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 1000

which should go against a public dataset. If you analyze that query you would see that the schema looked like this:

"schema": {
"fields": [
 {
  "name": "word",
  "type": "STRING",
  "mode": "NULLABLE"
 },
 {
  "name": "word_count",
  "type": "INTEGER",
  "mode": "NULLABLE"
 },
 {
  "name": "corpus",
  "type": "STRING",
  "mode": "NULLABLE"
 },
 {
  "name": "corpus_date",
  "type": "INTEGER",
  "mode": "NULLABLE"
 }
]
},

This might look good at first, but if you look at the table definition for bigquery-public-data.samples.shakespeare you will notice that the every field in that select are required in the table, so why does it end up being nullable in the schema for the select?

Some context:

I'm working on a F# type provider wher I try to encode all the values as correct as possible. That means nullable as option types and non nullable as regular types. If I always get nullable it will make the use much more cumbersome for fields that can't be nullable.

1

1 Answers

0
votes

Even though fields are REQUIRED in the table schema, query can do transformations which converts non NULL values to NULL values, therefore query result may have different schema (both with respect to nullability and to data types) then what the original table had.