2
votes

I am trying to query a view in BigQuery using Apache Beam.

The view has access to all of the datasets that it references. The Dataflow/GCE service account has access to the view, but not to its underlying datasets (this should not be a problem).

When I try to run a job that queries the Authorized View, I get an error like this:

java.lang.RuntimeException: java.io.IOException: Unable to get table: test_13249, aborting after 9 retries.
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1004)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:491)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:477)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:471)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQueryHelper.executeQuery(BigQueryQueryHelper.java:109)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySourceDef.getTableReference(BigQueryQuerySourceDef.java:113)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:65)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:110)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:290)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:212)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:196)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:175)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
    "reason" : "accessDenied"
  } ],
  "message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
  "status" : "PERMISSION_DENIED"
}
1

1 Answers

1
votes

Beam tries to be smart when dealing with BigQuery. This error happens because Beam inspects tables referenced by a query, and this is not always possible with Authorized Views.

To work around this issue, you can use the method withQueryLocation in BigQueryIO.readTableRows() or BigQueryIO.read(SerializableFunction). This will allow Beam to use the provided query location, and not infer one.

Therefore:

BigQueryIO.readTableRows()
    .fromQuery("SELECT * FROM my_authorized_view")
    .withQueryLocation("US")  // Whatever location is convenient for you
    ...

this should work around your issue.