I doing some meta programming with bigquery and noticed something I didn't expect.
I'm using this query:
SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 1000
which should go against a public dataset. If you analyze that query you would see that the schema looked like this:
"schema": {
"fields": [
{
"name": "word",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "word_count",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "corpus",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "corpus_date",
"type": "INTEGER",
"mode": "NULLABLE"
}
]
},
This might look good at first, but if you look at the table definition for bigquery-public-data.samples.shakespeare
you will notice that the every field in that select are required in the table, so why does it end up being nullable in the schema for the select?
Some context:
I'm working on a F# type provider wher I try to encode all the values as correct as possible. That means nullable as option types and non nullable as regular types. If I always get nullable it will make the use much more cumbersome for fields that can't be nullable.