1
votes

I have a simple POJO with dates, that will be stored as Avro in storage before imported into Google BigQuery. Dates are converted to long, and I'm trying to use @AvroSchema to override the schema generation for the date fields so that BigQuery understands which type the fields are.

The simple POJO:

public class SomeAvroMessage implements Serializable {
    @AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
    private long tm;
    @AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
    private long created;

    public SomeAvroMessage() {
    }
}

This ends up with the following AVRO-schema:

{"type":"record","name":"SomeAvroMessage",
"namespace":"some.namespace",
"fields":[
      {"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}},
      {"name":"created","type":{"type":"long","logicalType":"timestamp-millis"}}
]}

These seems to be wrong, and should be simply {"name":"tm","type":"long","logicalType":"timestamp-millis"}

This is used in Google Dataflow, with Apache Beam 2.22 written in Java.

Am I missing something?

2
What makes you think its wrong? The schema is a valid schema - rmesteves
Yes, this is a valid schema, per my answer. But is it causing a problem in some way? - Kenn Knowles

2 Answers

0
votes

The value {"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}} is correct. If we expand it to more clear pseudocode, it is:

Field {
  name: "tm",
  type: Schema {
    type: "long",
    logicalType: "timestamp-millis"
  }
}

You can see that the field has a name and a type. The type of an Avro field must be an Avro schema. The logicalType field goes inside the schema, not adjacent to it.

0
votes

As can be found in the documentation:

A logical type is an Avro primitive or complex type with extra attributes to represent a derived type. The attribute logicalType must always be present for a logical type, and is a string with the name of one of the logical types listed later in this section. Other attributes may be defined for particular logical types.

The documentation also gives an example of date type in avro schemas:

{
  "type": "int",
  "logicalType": "date"
}

Basically your schema is correct and everytime you need to use some Logical Type you can build your schema like this.