1
votes

I have troubles getting BigQuery to load timestamps from avro-files correctly.

The avro-files have date columns stored as long, with logical type timestamp-micros. As per documentation, BigQuery should store this as timestamp data type. I have also tried timestamp-millis for logical type.

The data is stored in avro like this:

{'id': '<masked>', '<masked>': '<masked>', 'tm': 1553990400000, '<masked>': <masked>, '<masked>': <masked>, 'created': 1597056958864}

The fields tm and created are longs, 2019-03-31T00:00:00Z and 2020-08-10T11:50:58.986816592Z respectively.

The schema for the avro is

{"type":"record","name":"SomeMessage","namespace":"com.df",
"fields":
[{"name":"id","type":"string"},
{"name":"<masked>","type":"string"},
{"name":"tm","type":"long","logicalType":"timestamp-micros"},
{"name":"<masked>","type":"int"},
{"name":"<masked>","type":"float"},
{"name":"created","type":"long","logicalType":"timestamp-micros"}]}";

When imported to BigQuery through bq load, records ends up like these:

<masked>   <masked>   tm                            <masked>   <masked>  created
________________________________________________________________________________________________________
<masked> | <masked> | 1970-01-18 23:39:50.400 UTC | <masked> | <masked> | 1970-01-19 11:37:36.958864 UTC
________________________________________________________________________________________________________

The import command used is:

bq load --source_format=AVRO --use_avro_logical_types some_dataset.some_table "gs://some-bucket/some.avro"

The timestamps in BigQuery are nowhere near the actual values provided in avro.

Anyone have any ideas on how to do this properly?

1
It seems data 1553990400000 & 1597056958864 are in millis. Try passing micro seconds or use timestamp-millis instead of timestamp-micros in schema.Ambrish
Logical attributes are ignored by default, for backward compatibility reasons as logical types were not initially supported, cloud.google.com/bigquery/docs/…. Did you enable useAvroLogicalTypes?Hua Zhang
I did enable useAvroLogicalTypes. I updated my question to reflect this.gixen
@Ambrish, using timestamp-millis produces same results as timestamp-micros. Its almost like BigQuery is not seeing the logicaltype information.gixen

1 Answers

1
votes

I figured out that the avro schema is actually wrong. The timestamp fields should be like this:

{"name":"created","type":{"type":"long", "logicalType":"timestamp-millis"}}