1
votes

I have a Bigquery data transfer job setup to a destination table which is partitioned by month. The table has been created with the following command:

bq mk --table \                                                                              
  --schema schema.json \
  --time_partitioning_field createdAt \
  --time_partitioning_type MONTH \
  myproject:mydataset.MyTable

The datatransfer job has been created with the Python BQDTS client, like this:

parent = f"projects/myproject/locations/{location}"
baseparams = {
    "file_format": "CSV",
    "ignore_unknown_values": True,
    "field_delimiter": ",",
    "skip_leading_rows": "0",
    "allow_jagged_rows": True,
}
params = Struct()
params_content = baseparams.copy()
params_content[
    "data_path_template"
] = f"gs://mybucket/**/*.csv"
params_content["destination_table_name_template"] = "MyTable"

params.update(params_content)
tc_dict = {
    "display_name": target_display_name,
    "destination_dataset_id": "mydataset",
    "data_source_id": "google_cloud_storage",
    "schedule": "every 24 hours",
    "params": params,
}
tc = bigquery_datatransfer_v1.types.TransferConfig(**tc_dict)
response = client.create_transfer_config(
    request={"parent": parent, "transfer_config": tc}
)

As you can see, there is no partitioning specified in the job definition, it is only specified in the database table, as should be according to the documentation:

Partitioning options Cloud Storage and Amazon S3 transfers can write to partitioned or non-partitioned destination tables. There are two types of table partitioning in BigQuery:

Partitioned tables: Tables that are partitioned based on a column. The column type must be a TIMESTAMP or DATE column. If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema.

This job has been running successfully for days, until last week (Last successful run on 2020-11-04). This night (2020-11-10), the job failed with the following error message:

Incompatible table partitioning specification. Destination table exists with partitioning specification interval(type:MONTH,field:createdAt), but transfer target partitioning specification is interval(type:DAY,field:createdAt). Please retry after updating either the destination table or the transfer partitioning specification.

I have tried to recreate tables and jobs with such specification and it indeed fails everytime the destination table partitioning type is MONTH. However, this still work if the partitioning type is DAY. What confuses me the most is the message "the transfer partitioning specification" as such a parameter it doesn't seem to exist in the documentation.

Is it a recent API breaking change in GCP which has not been documented yet ?

1
I would recommend you to raise an issue over at the GCP issue tracker as this doesn't seem to be a normal behaviour - rsalinas
I did, and GCP support team is working on it. I'll update this question when it will be resolved - matthieu.cham
Here is the response from Google Support: "Regarding the "Incompatible partitioning" error, the BigQuery Engineering team has identified the issue and they are working on a fix. The team estimates this fix could be ready at the end of the first week of December." - matthieu.cham

1 Answers

0
votes

After a few weeks of investigation and bug fixing on the GCP Team side, the problem has been solved since December 7th, 2020. It was indeed a bug in the Big Query Transfer service.