1
votes

We have a strange issue that happen quite often.

We have a process which getting files from sources and loading it into the GCS. Than, and only if the file uploaded successfully, we try to load it into the BigQuery table and get the error of "Not found: Uris List of uris (possibly truncated): json: file_name: ...".

After a deep investigation, it all supposed to be fine, and we don't know what had changed. In the time frames, the file in the job exists in the cloud storage, and uploaded into the GCS 2 minutes before BigQuery tried to get it.

There is need to say that we load every file as the whole batch dictionary in the Cloud Storage, like gs://<bucket>/path_to_dir/*. Is that still supported? Also, the file sizes are kind of small - from few bytes to KB. Is that matter?

job ids for checking:
load_file_8e4e16f737084ba59ce0ba89075241b7 load_file_6c13c25e1fc54a088af40199eb86200d

2
Does the error persist if you wait for 10-15 minutes after loading the file into GCS? - Sonya
Seems to be an issue with GCS and inconsistent results from object listing for multi-regional buckets in the US. Monitoring at status.cloud.google.com/incident/storage/16036 - Felipe Hoffa

2 Answers

0
votes

Known issue with Cloud Storage consistency

As noted by Felipe, this was indeed related to a known issue with Cloud Storage. Google Cloud Storage Incident #16036 is shown to have been resolved since December 20, 2016. This was also being tracked in Issue 738. Though Cloud Storage list operations are eventually consistent, this incident displayed excessive delays in operations returning consistent results.

Handling Cloud Storage inconsistency

Though this was an isolated incident, it is nevertheless a good practice to have some means of handling such inconsistencies. Two such suggestions can be found in comment #10 of the related public issue.

  1. Retry the load job if it failed.
  2. Verify if Cloud Storage results are consistent with expectations

    Verify the expected number of files (and total size) was processed by BigQuery. You can get this information out of the Job metadata.

Still getting unexpected results

Should you encounter such an issue again and have the appropriate error handling measures in place, I recommend first consulting the Google Cloud Status Dashboard and BigQuery public issue tracker for existing reports showing similar symptoms. If none exist, file a new issue on the issue tracker.

0
votes

The solution was to move from Multi Region Bucket(that was set before Region type was enable) to Region. Since we moved, we never faced this issue.