Amazon S3 parquet file - Transferring to GCP / BQ

Question

Good morning everyone. I have a GCS Bucket, which has files that have been transferred from our Amazon S3 bucket. These files are in .gz.parquet format. I am trying to set up a transfer from the GSC bucket to BigQuery with the transfer feature, however I am running into issues with the parquet file format.

When I create a transfer and specify the file format as Parquet, I receive an error stating that the data is not in parquet format. When I tried specifying the file in CSV, weird values appear in my table as shown in the image linked:

I have tried the following URIs:

bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.parquet. FILE FORMAT: PARQUET. RESULTS: FILE NOT IN PARQUET FORMAT.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.gz.parquet. FILE FORMAT: PARQUET. RESULTS: FILE NOT IN PARQUET FORMAT.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.gz.parquet. FILE FORMAT: CSV. RESULTS: TRANSFER DONE, BUT WEIRD VALUES.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.parquet. FILE FORMAT: CSV. RESULTS: TRANSFER DONE, BUT WEIRD VALUES.

Does anyone have any idea on how I should proceed? Thank you in advance!

Maybe the issue comes from the gz compression ? Have you tried uncompressing the files before tranfering them ? — Cylldby
Hello, thank you for your response. I was thinking maybe this can be it. I am trying to use the Transfer function from GCS to BQ because it's easier, but perhaps I need to use CloudComposer/Python instead... — Victoria
Ok, I was not looking at the right place.. Actually BQ supports GZipped Parquet files ! — Cylldby

Anbu Thirugnana Sekar Anbu Thirugnana Sekar · Accepted Answer · 2021-05-26T12:17:26

There is a dedicated documentation explaining how to copy Parquet data from Cloud storage bucket to Big Query which is given below. Could you please go thru it and update us if its still not solving your problem.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet

Regards, Anbu.

Amazon S3 parquet file - Transferring to GCP / BQ

2 Answers