0
votes

I'm trying to import data into BigQuery via AVRO with a Date partition. When importing via the cli an error is related to a partitioned date has to be a Date or Timestamp but it is getting an Integer.

Given an AVRO file similar to the one below:

{
  "namespace": "test_namespace",
  "name": "test_name",
  "type": "record",
  "fields": [
    { 
      "name": "partition_date",
      "type": "int",
      "logicalType": "date"
    },
    {
      "name": "unique_id",
      "type": "string"
    },
    {
      "name": "value",
      "type": "double"
    }
}

I am then using the following commands through the CLI to try and create a new table

bg load \
--replace \
--source_format=AVRO \
--use_avro_logical_types=True \
--time_partitioning_field partition_date \
--clustering_fields unique_id \
mydataset.mytable \
gs://mybucket/mydata.avro

The expectation is that a new table that is partitioned on the Date column "partition_date" and then clustered by "unique_id".

Edit: Please see the error below

The field specified for the time partition can only be of type TIMESTAMP or DATE. The type found is: INTEGER.

The exact command I am using is as follows:

bq load --replace --source_format=AVRO --use_avro_logical_types=True --time_partitioning_field "partition_date" --clustering_fields "unique_id" BQ_DATASET BUCKET_URI

This is the AVRO schema that I am using

{
    "namespace": "example.avro",
    "type": "record",
    "name": "Test",
    "fields": [
        { "name": "partition_date", "type": "int", "logicalType": "date" },
        { "name": "unique_id", "type": "string"},
        { "name": "value", "type": "float" }
   ]
}

It's worth noting that this is an old Google Project (about 2 - 3 years old) if that is any relevance.

I'm also on windows 10 with the latest Google SDK.

2

2 Answers

0
votes

I haven't received any error message doing the same loading operation, generating equal AVRO data schema and using the desired Bigdata sink table structure.

According to GCP documentation you've used --use_avro_logical_types=True flag along bq command-line properly propagating conversion data types, keeping DATA Avro logical type be translated to the equivalent Date type in Bigquery.

You can refer to my Bigquery table schema, validating table structure on your side, as you haven't provided table structure and error message itself I can't suggest more so far:

$ bq show --project_id=<Project_ID>  <Dataset>.<Table>
Table <Project_ID>:<Dataset>.<Table>

   Last modified            Schema            Total Rows   Total Bytes   Expiration        Time Partitioning        Clustered Fields   Labels
 ----------------- ------------------------- ------------ ------------- ------------ ----------------------------- ------------------ --------
  22 Apr 12:03:57   |- partition_date: date   3            66                         DAY (field: partition_date)   unique_id
                    |- unique_id: string
                    |- value: float

I have used FLOAT type for value to plainly convert AVRO DOUBLE data type as per recommendations here.

bq CLI version:

$ bq version
This is BigQuery CLI 2.0.56

Feel free to expand the origin question with more specific information on the issue you're hitting, further assisting more accurately with the solution.

UPDATE:

I've checked information provided, but I'm still confused with the error you're getting. Apparently I see that in your case flag use_avro_logical_types=True does not perform logical type conversion. However I've found this PIT feature request where people are asking to "whitelist" their projects in order to afford AVRO logicaltype functionality, i.e. this comment. Since this feature have been rolled out to globe community, it might be the oversight that some GCP projects are not enabled to use it.

0
votes

Google finally got back to me (7 months later). In this time I no longer have access to the initial project that I had issues with. However I'm documenting a successful example for those finding this later with a new project.

Following a comment from the issue tracker here I found that I was not using a complex type for the logical date field.

So this:

{ 
  "name": "partition_date",
  "type": "int",
  "logicalType": "date"
}

Should have been written like this (Note the nested complex object for type):

{ 
  "name": "partition_date",
  "type": {
    "type": "int",
    "logicalType": "date"
  }
}

Although the avro specification lists a date as the number of days from the unix epoch (1 Jan 1970) I had to write the partition_date as datetime.date(1970, 1, 1) instead of just 0.

The commands (bq) were unchanged from the original post.

As stated I don't know if this would have fixed my issue with the original project but hopefully this helps the next person.