I was trying to create an external table on the partitioned data in GCS written from a spark job with date partitioned in PARQUET format.
The data is the GCS bucket is as shown the picture .
I created an external table with table definition
"hivePartitioningOptions": {
"mode": "AUTO",
"sourceUriPrefix": "gs://transaction_data_bucket_for_bigquery/trx_data"
},
"sourceFormat": "PARQUET",
"sourceUris": [
"gs://transaction_data_bucket_for_bigquery/trx_data/*"
]
}
with command
bq mk --external_table_definition=/tmp/table_def <project>:<dataset>.sample_trx_external
When i try to query the table, i am getting a strange error.
Partition keys should be invariant from table creation across all partitions, with the number of partition keys held constant with invariant names. Expected 0 partition keys ([]), but 1 ([transaction_date]) were encountered along path /bigstore/transaction_data_bucket_for_bigquery/trx_data/transaction_date=2016-01-01.; Cannot add hive partitioning to table <data_set>.sample_trx_external -- table creation from underlying uri failed.. Underlying error: Partition keys should be invariant from table creation across all partitions, with the number of partition keys held constant with invariant names. Expected 0 partition keys ([]), but 1 ([transaction_date]) were encountered along path /bigstore/transaction_data_bucket_for_bigquery/trx_data/transaction_date=2016-01-01..
Unfortunately i couldn't able to decipher the message. There are only 1 day of transactions written to the GCS bucket .
When i tried with CUSTOM mode definition
{
"hivePartitioningOptions": {
"mode": "CUSTOM",
"sourceUriPrefix": "gs://transaction_data_bucket_for_bigquery/trx_data/{transaction_date:DATE}"
},
"sourceFormat": "PARQUET",
"sourceUris": [
"gs://transaction_data_bucket_for_bigquery/trx_data/*"
]
}
i got a slightly different error
Partition keys should be invariant from table creation across all partitions, with the number of partition keys held constant with invariant names. Expected 1 partition keys ([transaction_date]), but 0 ([) were encountered along path /bigstore/transaction_data_bucket_for_bigquery/trx_data.; Cannot add hive partitioning to table <data_Set>.sample_trx_external_2 -- table creation from underlying uri failed.. Underlying error: Partition keys should be invariant from table creation across all partitions, with the number of partition keys held constant with invariant names. Expected 1 partition keys ([transaction_date]), but 0 ([) were encountered along path /bigstore/transaction_data_bucket_for_bigquery/trx_data..
Got struck here and any suggestion would be a great help .