0
votes

How can I get the list of tables from a Google BigQuery dataset using apache beam with DataflowRunner?

I can't find how to get tables from a specified dataset. I want to migrate tables from a dataset located in US to one in EU using Dataflow's parallel processing programming model.

3
please do tag if you are using java or python. Thanks!Haris Nadeem
using java, apache dataflow with python has some open issues...user2291521

3 Answers

0
votes

Declare library

from google.cloud import bigquery

Prepares a bigquery client

client = bigquery.Client(project='your_project_name')

Prepares a reference to the new dataset

dataset_ref = client.dataset('your_data_set_name')

Make API request

tables = list(client.list_tables(dataset_ref))
if tables:
    for table in tables:
        print('\t{}'.format(table.table_id))

Reference: https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets

0
votes

You can try using google-cloud-examples maven repo. There's a class by the name BigQuerySnippets that makes a API call to get the table meta and you can fetch the the schema. Please note that the limit API quota is 6 maximum concurrent requests per second.

0
votes

The purpose of Dataflow is to create pipelines, so the ability to make some API requests is not included. You have to use the BigQuery Java Client Library to get the data and then provide it to your Apache Pipeline.

DatasetId datasetId = DatasetId.of(projectId, datasetName);
Page<Table> tables = bigquery.listTables(datasetId, TableListOption.pageSize(100));
for (Table table : tables.iterateAll()) {
  // do something
}