0
votes

I have successfully imported many gzipped JSON files on several occasions. For the two files BQ import choked. Both files reported the same error:

File: 0 / Offset:0 / Line:1 / Column:20971521, Row larger than the maximum allowed size

Now I've read about the row limit of 20MB and I understand that the number above is 20MB +1 but what really bugs me is that the meaning is totally off. My GZs have millions of JSONs (each on a new line). I have written a script to measure the longest line (longest JSON) in the failed GZ file and found it to be 103571 bytes. Why is the BQ import choking then?

I have inspected the longest JSON and it looks perfectly normal. How should I interpret the error? How can I fix it?

Why is BQ thinking the import is on line 1, column 20971521 when there are millions of lines in the file?

2

2 Answers

0
votes

All your investigations are correct, but you must check your file as new lines are not identified, and BQ seas all the import as a large line.

That's why it reports column 20971521 for the problem.

You should try importing a sample from the file.

0
votes

Some of the answers here gave me an idea so I went on a tried it. It appears as if for some strange reason BQ didn't like line endings so I wrote a quick script to rewrite the original input file to use line endings. Automagically the import worked!

This is utterly strange considering I already imported many GBs of data with pure line endings.

I am happy that it worked but I could never guess why. I hope this helps someone else.