Python 2.7 and GCP Google BigQuery: extracts - compression not working

Question

I'm using python 2.7 (can't change right now), and Google python client library v0.28 of google.cloud.bigquery, and the compression="GZIP" or "NONE" argument/setting doesn't appear to be working for me, can someone else try this out and let me know if it works for them?

In the code below you can see I've been playing with this, but each time on GCS my files appear to be non-compressed, no matter what I use for the compression.

Note: my imports are for a larger set of code, not all needed for this snippet

from pandas.io import gbq
import google.auth
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
from google.cloud.bigquery import LoadJobConfig
from google.cloud.bigquery import Table
import json
import re
from google.cloud import storage


bigquery_client = bigquery.Client(project=project)
dataset_ref = bigquery_client.dataset(dataset_name)
table_ref = dataset_ref.table(table_name)

job_id_prefix = "bqTools_export_job"

job_config = bigquery.LoadJobConfig()

# default is ","
if field_delimiter:
    job_config.field_delimiter = field_delimiter

# default is true
if print_header:
    job_config.print_header = print_header

# CSV, NEWLINE_DELIMITED_JSON, or AVRO
if destination_format:
    job_config.destination_format = destination_format

# GZIP or NONE
if compression:
    job_config.compression = compression

job_config.Compression = "GZIP"
job_config.compression = "GZIP"

job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)

# job.begin()
job.result()  # Wait for job to complete

returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)

Related links:

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.compression

https://googlecloudplatform.github.io/google-cloud-python/latest/_modules/google/cloud/bigquery/job.html

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.compression

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/api/export_data_to_cloud_storage.py

I'm sure I'm doing something stupid, thank you for your help...Rich

EDIT BELOW

In the interest of sharing, here is what I think our final code will be...Rich

# export a table from bq into a file on gcs,
# the destination should look like the following, with no brackets {}
# gs://{bucket-name-here}/{file-name-here}
def export_data_to_gcs(dataset_name, table_name, destination,
                       field_delimiter=",", print_header=None,
                       destination_format="CSV", compression="GZIP", project=None):
    try:
        bigquery_client = bigquery.Client(project=project)
        dataset_ref = bigquery_client.dataset(dataset_name)
        table_ref = dataset_ref.table(table_name)

        job_id_prefix = "bqTools_export_job"

        job_config = bigquery.ExtractJobConfig()

        # default is ","
        if field_delimiter:
            job_config.field_delimiter = field_delimiter

        # default is true
        if print_header:
            job_config.print_header = print_header

        # CSV, NEWLINE_DELIMITED_JSON, or AVRO
        if destination_format:
            job_config.destination_format = destination_format

        # GZIP or NONE
        if compression:
            job_config.compression = compression

        # if it should be compressed, make sure there is a .gz on the filename, add if needed
        if compression == "GZIP":
            if destination.lower()[-3:] != ".gz":
                destination = str(destination) + ".gz"

        job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)

        # job.begin()
        job.result()  # Wait for job to complete

        returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)

        return returnMsg

    except Exception as e:
        errorStr = 'ERROR (export_data_to_gcs): ' + str(e)
        print(errorStr)
        raise

Shouldn't it be bigquery.ExtractJobConfig and not bigquery.LoadJobConfig? — Graham Polley

Daria Daria · Accepted Answer · 2017-11-21T21:36:33

1

votes

For table extract you should use ExtractJobConfig

Python 2.7 and GCP Google BigQuery: extracts - compression not working

1 Answers