
I have a requirement to load 100's of tables to BigQuery from Google Cloud Storage(GCS -> Temp table -> Main table). I have created a python process to load the data into BigQuery and scheduled in AppEngine. Since we have Maximum 10min timeout for AppEngine. I have submitted the jobs in Asynchronous mode and checking the job status later point of time. Since I have 100's of tables need to create a monitoring system to check the status the job load.

Need to maintain a couple of tables and bunch of views to check the job status.

The operational process is little complex. Is there any better way?


could you please summarize what steps you do, maybe something can be improved.Pentium10
Could you please tell me step by step how did you create a python process in appengine and schedule it as I am looking to do the same in bigquery where I want to load csv files from google bucket into bigquery table using appengine using python?LondonUK

1 Answers


When we did this, we simply used a message queue like Beanstalkd, where we pushed something that later had to be checked, and we wrote a small worker who subscribed to the channel and dealt with the task.

On the other hand: BigQuery offers support for querying data directly from Google Cloud Storage.

Use cases:
- Loading and cleaning your data in one pass by querying the data from a federated data source (a location external to BigQuery) and writing the cleaned result into BigQuery storage.
- Having a small amount of frequently changing data that you join with other tables. As a federated data source, the frequently changing data does not need to be reloaded every time it is updated.
