4
votes

I have a Google Dataflow batch job written in Java.

  • This Java code accesses Bigquery and performs a few transformations and then outputs back into Bigquery.
  • This code can access the Bigquery tables just fine.

But, when I choose a table that is backed by a federated source like google sheets it doesn't work.

It says no OAuth token with Google Drive scope found.

Pipeline options

PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p1 = Pipeline.create(options);

Any ideas?

2
Did you read cloud.google.com/bigquery/…? "If you are using the BigQuery API, request the OAuth scope for Google Drive in addition to the scope for BigQuery." I'm not familiar with the Dataflow APIs, but you would need to provide explicit Drive access as part of the request somehow. - Elliott Brossard
Yeah I know how to do it using a standalone Python script and BigQuery API. I just don't know how to do it properly with Google Data Flow. - Justin Wong

2 Answers

1
votes

Can you try:

gcloud auth login --enable-gdrive-access

before you launch the Dataflow job?

0
votes

Answering my own question, but to get around this issue I'm going to use Google Apps Script to upload to Bigquery as a native table.

Please see this link.

I'm just going to modify the Load CSV data code snippet into BigQuery and then create an installable trigger to execute this function every night to upload to Bigquery.

Beware you can't execute triggers like onEdit, onOpen that require authorisation.