2
votes

I am running a series of BigQuery jobs, two jobs are each using LOAD function to insert-overwrite data into two tables from Google Storage, and then the last job performs a JOIN on these tables to produce a result table.

The problem I am experiencing is that the result table from the JOIN does not reflect the data from one of the two tables I have loaded, implying that the data written during the LOAD job is not yet available for query.

When I re-ran the JOIN manually about an hour later, the result table was correct. This implies there is some unknown time period where the data was loaded but the contents of the table was not yet refreshed.

Is there more information that the google team can provide regarding this situation?

Here is the logging to understand the time line:

table 1 LOAD complete

2015-03-20 16:22:54,237 INFO com.ni.google.application.ImportApplication - job job_U_OkoXXk91zl2wlyKWb5uWxNHkk is complete, table media_20150320 set to expire at 1434639614948

table 2 LOAD complete

2015-03-20 16:33:29,123 INFO com.ni.google.application.LoadTablesApplication - job job_QHxva8d6lXmxpaiZDyUmyDSWu6o is complete, table warehouse_dataview_interest_counts_1day set to expire at 1434645158930

# SHOULD I SLEEP HERE

table1 JOIN table2 BEGIN

2015-03-20 16:33:39,916 INFO com.ni.google.application.RollupApplication - loading query template: warehouse_comparison_1day

Thanks, Luke

1
I've been watching this one for any responses as the docs [1] say that data should be consistent when a load job completes. Seems it could have been a transient issue? [1] cloud.google.com/bigquery/… - Adam
Seems it was a transient issue or something on my side, I have not witnessed the issue since. - lukeforehand

1 Answers

0
votes

It happened to me also, and it could have been a transient issue for some nodes. Anyway we are good now. And I see from your updates that you are good.

If you see similar issue, please post it to the issue tracker: https://code.google.com/p/google-bigquery/