5
votes

Here's the case:

  • Our client daily uploads CSVs (overwritten) to a bucket in Google Cloud Storage (each table in a different file).
  • We use BigQuery as DataSource in DataStudio
  • We want to automatically transfer the CSVs to BigQuery.

The thing is, even though we've:

  1. Declared the tables in BigQuery with "Overwrite table" write preference option
  2. Configured the daily Transfers vía UI (BigQuery > Transfers) to automatically upload the CSVs from Google Cloud one hour after the files are uploaded to Google Cloud, as stated by the limitations.

The automated transfer/load is by default in "WRITE_APPEND", thus the tables are appended instead of overwritten in BigQuery.

Hence the question: How/where can we change the

configuration.load.writeDisposition = WRITE_TRUNCATE

as stated here in order to overwrite the tables when the CSVs are automatically loaded?

I think that's what we're missing.

Cheers.

2
Which option are you using when writing your query results: cloud.google.com/bigquery/docs/writing-resultsTamir Klein
Hi @TamirKlein, we are bigquery.admin.gabi493
No this is your writer access permission. I mean how do you write your query result, are you UI, Java, other?Tamir Klein
Sorry @TamirKlein, I misunderstood you; we're doing everything via UI, but we're not executing any queries we need to store. The only ones we execute for now are the ones used to manually check some results, thus we don't need to write them in any table.gabi493
I'm confused. Anyway, the writeDisposition is not defined on the table level rather it's defined when you write into the table. So based on the way you write you can set it up. Hope this help.Tamir Klein

2 Answers

3
votes

None of the above worked for us, so I'm posting this in case anyone has the same issue.

We scheduled a query to erase the table content just before the automatic importation process starts:

DELETE FROM project.tableName WHERE true

And then, new data will be imported to a void table, therefore default "WRITE_APPEND" doesn't affect us.

1
votes

1) One way to do this is to use DDL to CREATE and REPLACE your table before running the query which imports the data.

This is an example of how to create a table

#standardSQL
 CREATE TABLE mydataset.top_words
 OPTIONS(
   description="Top ten words per Shakespeare corpus"
 ) AS
 SELECT
   corpus,
   ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words
 FROM bigquery-public-data.samples.shakespeare
 GROUP BY corpus;

Now that it's created you can import your data.

2) Another way is to use BigQuery schedule Queries enter image description here

3) If you write Python you can find an even better solution here