When I try to load multiple files from cloud storage larger jobs almost always fail. When I try to load an individual file that works, but loading batches is really much more convenient.
Snippet: Recent Jobs Load 11:24am gs://albertbigquery.appspot.com/uep/201409/01/wpc_5012_20140901_0002.log.gz toalbertbigquery:uep.201409
Load 11:23am gs://albertbigquery.appspot.com/uep/201409/01/wpc_5012_20140901_0001.log.gz toalbertbigquery:uep.201409
Load 11:22am gs://albertbigquery.appspot.com/uep/201409/01/* toalbertbigquery:uep.201409 Errors: File: 40 / Line:1 / Field:1, Bad character (ASCII 0) encountered: field starts with: <�> File: 40 / Line:2 / Field:1, Bad character (ASCII 0) encountered: field starts with: <5C���>}�> File: 40 / Line:3 / Field:1, Bad character (ASCII 0) encountered: field starts with: <����W�o�> File: 40 / Line:4, Too few columns: expected 7 column(s) but got 2 column(s). For additional help: File: 40 / Line:5, Too few columns: expected 7 column(s) but got 1 column(s). For additional help: File: 40 / Line:6, Too few columns: expected 7 column(s) but got 1 column(s). For additional help: File: 40 / Line:7, Too few columns: expected 7 column(s) but got 1 column(s). For additional help: File: 40 / Line:8 / Field:1, Bad character (ASCII 0) encountered: field starts with: <��hy�>
The worst with this problem is that I don't know which file is "File: 40" the order seems random, otherwise I could remove that file and load the data, or try to find the error in the file.
I also strongly doubt that there even is an actual file error, for example in the above case when I removed all files but _0001 and _0002 (that worked fine to load as single files) I still get this output:
Recent Jobs Load 11:44am gs://albertbigquery.appspot.com/uep/201409/01/* toalbertbigquery:uep.201409 Errors: File: 1 / Line:1 / Field:1, Bad character (ASCII 0) encountered: field starts with: <�> File: 1 / Line:2 / Field:3, Bad character (ASCII 0) encountered: field starts with: File: 1 / Line:3, Too few columns: expected 7 column(s) but got 1 column(s). For additional help: File: 1 / Line:4 / Field:3, Bad character (ASCII 0) encountered: field starts with:
Sometimes though the files load just fine, otherwise I'd expect that multiple file loading was all wrecked.
Info: Average file size is around 20MB, usually a directory is 70 files somewhere between 1 and 2 GB.