1
votes

When attempting to extract data from a BigQuery table requiring a partition filter, the extract job fails.

Here is a simple example creating a table and running an extract job.

package com.example;

import com.google.cloud.bigquery.*;

public class BigQueryExtractTest {

    private static final String PROJECT_ID = "my-project-id";
    private static final String DATASET_ID = "test_dataset";
    private static final String GCS_LOCATION = "gs://my-bucket/path/to/files/part-*";

    public static void main(String[] args) throws Exception {
        // create BigQuery client
        BigQuery bigQuery = BigQueryOptions.newBuilder().setProjectId(PROJECT_ID).build().getService();

        // create dataset and table that requires partition filter
        bigQuery.create(DatasetInfo.of(DATASET_ID));
        bigQuery.query(QueryJobConfiguration.of(
                String.format("CREATE TABLE %s.table1 (\n", DATASET_ID) +
                        "stringColumn STRING,\n" +
                        "timeColumn TIMESTAMP\n" +
                        ") PARTITION BY DATE(timeColumn)\n" +
                        "OPTIONS(\n" +
                        "require_partition_filter=true\n" +
                        ")"));

        // extract table
        Job job = bigQuery.getTable(TableId.of(DATASET_ID, "table1"))
                .extract("NEWLINE_DELIMITED_JSON", GCS_LOCATION)
                .waitFor();

        // throw exception on error
        if (job != null && job.getStatus().getError() != null) {
            throw new Exception(job.getStatus().getError().toString());
        }
    }

}

The code snippet above produces the following error

Exception in thread "main" java.lang.Exception: BigQueryError{reason=invalidQuery, location=query, message=Cannot query over table 'my-project-id.test_dataset.table1' without a filter that can be used for partition elimination}
    at com.example.BigQueryExtractTest.main(BigQueryExtractTest.java:34)

The google-cloud-bigquery maven dependency used for this example is shown below.

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-bigquery</artifactId>
    <version>1.23.0</version>
</dependency>

The example also threw the exception using the dependency version 0.34.0-beta

How can a partition filter be specified when running an extract job?

1
Sounds like a bug. Can you submit a bug report with this information (or link to this post?) As a temporary workaround, you can maybe toggle the setting off, extract, and then turn the setting back on. - Elliott Brossard
Thanks @ElliottBrossard . Just submitted https://issuetracker.google.com/issues/76453364. How can the require partition filter setting be toggled on/off? - JonSim
This worked for me: bq update --time_partitioning_type=DAY --norequire_partition_filter <table name>. I think you need to specify --time_partitioning_field instead if you're using column-based partitioning though. - Elliott Brossard
I added this information as an answer below to make it more obvious in case someone else hits this same problem. - Elliott Brossard
yep that worked @ElliottBrossard. thanks! - JonSim

1 Answers

3
votes

This is a bug, and there is now a bug report tracking the problem. To work around the limitation, you can update the table to allow queries without a partition filter using the bq command-line tool, perform the export, and then update the table to require it again. For example, with an ingestion-time partitioned table:

bq update --time_partitioning_type=DAY --norequire_partition_filter \
    <dataset_name>.<table name>

... (perform the export)

bq update --time_partitioning_type=DAY --require_partition_filter \
    <dataset_name>.<table name>

... (resume querying the table)