0
votes

We have defined a company wide shared model in Avro and would like to load this model data into BigQuery using LoadJobs with auto schema retrieval and later be able to export the data (to Google Cloud Storage or somewhere else) and deserialize/read it using the same Avro model.

The problem with this approach is that the exported Avro schema is different from the loaded schema and therefor the deserialization, using the same schema as for the loading, fails.

We see the following incompatible type conversions:

Model schema (used when loading) Derived BigQuery type Schema (after export)
int INTEGER long
float FLOAT double
time-millis TIME time-micros
timestamp-millis TIMESTAMP timestamp-micros
map REPEATED RECORD array

Fmpov I only see the following solutions to this problem:

  1. Create table+schema upfront instead of using auto-retrieval
  2. Use adapters when loading into and/or exporting from BigQuery
  3. Change the Avro schema to use types which are "compatible" (that are not changed between load and export) with BigQuery

Any further ideas?