1
votes

I have successfully loaded large number of AVRO files (of same schema type into same table), stored on Google Storage, using bq CLI utility.

However, for some of the AVRO files I am getting very cryptic error while loading into bigquery, the error says:

The Apache Avro library failed to read data with the follwing error: EOF reached (error code: invalid)

With avro-tools validated that the AVRO file is not corrupted, report output:

java -jar avro-tools-1.8.1.jar repair -o report 2017-05-15-07-15-01_48a99.avro Recovering file: 2017-05-15-07-15-01_48a99.avro File Summary: Number of blocks: 51 Number of corrupt blocks: 0 Number of records: 58598 Number of corrupt records: 0

I tried creating a brand new table with one of the failing files in case it was due to schema mismatch but that didnt help as the error was exactly the same.

need help to figure out what could be causing the error here?

1
Can you submit a bug to the issue tracker with a sample file that reproduces the problem, assuming it doesn't contain any sensitive data? That would help the BigQuery team to debug what is going on, since this sounds like a bug.Elliott Brossard

1 Answers

0
votes

No way to pinpoint the issue without more information, but I ran into this error message and filed a ticket here.

I a number of files in a single load job were missing columns which was causing the error.

Explanation from the ticket.

BigQuery uses the alphabetically last file from the directory as the avro schema to read the other Avro files. I suspect the issue is with schema incompatibility between the last file and the "problematic" file. Do you know if all the files have the exact same schema or differ? One thing you could try to help verify this is to copy the alphabetically last file of the directory and the "problematic" file to a different folder and try to load those two files in one BigQuery load job and see if the error reproduces.